NAVAL 

POSTGRADUATE 

SCHOOL 

MONTEREY,  CALIEORNIA 


THESIS 


A  FIELD  PROGRAMMABLE  GATE  ARRAY  BASED 
SOFTWARE  DEFINED  RADIO  DESIGN  FOR  THE 
SPACE  ENVIRONMENT 

by 

Jeremy  V.  Livingston 
December  2009 

Thesis  Advisor;  Frank  E.  Kragh 

Co-Advisor;  Herschel  Loomis 


Approved  for  public  release;  distribution  is  unlimited 


REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  No.  0704-0188 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instruction, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send 
comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to 
Washington  headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA 
22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188)  Washington  DC  20503. 

I.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

December  2009  Master’s  Thesis 

4.  TITLE  AND  SUBTITLE  A  Field  Programmable  Gate  Array  Based  Software  5.  PENDING  NUMBERS 
Defined  Radio  Design  for  the  Space  Environment _ 

6.  AUTHOR(S)  Jeremy  V.  Livingston 

7.  PERPORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES)  8.  PERPORMING  ORGANIZATION 

Naval  Postgraduate  School  REPORT  NUMBER 

9.  SPONSORING  /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES)  10.  SPONSORING/MONITORING 
N/A  AGENCY  REPORT  NUMBER 

II.  SUPPLEMENTARY  NOTES  The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official  policy 

or  position  of  the  Department  of  Defense  or  the  U.S.  Government. _ 

1 2a.  DISTRIBUTION  /  AVAll .ABll  JTY  STATEMENT  12b.  DISTRIBUTION  CODE 

Approved  for  public  release;  distribution  is  unlimited 

13.  ABSTRACT  (maximum  200  words) 

This  thesis  focuses  on  a  software  defined  radio  (SDR)  designed  to  compress  a  wideband  radio  signal  input  for  a 
narrowband  signal  output.  The  design  is  based  on  a  Field  Programmable  Gate  Array  (FPGA),  which  is  chosen  for  its 
reprogrammability,  flexibility,  and  our  ability  to  introduce  fault  tolerance  into  the  design.  Software  design  tools 
allowed  programming  to  be  done  at  a  high  level,  thereby  allowing  more  progress  on  the  design.  This  thesis  focuses 
on  one  such  SDR  that  was  designed  at  a  high  level  of  abstraction.  This  thesis  documents  an  analysis  of  the  memory 
and  timing  requirements  of  the  circuit  so  that  it  may  be  used  on  resource-constrained  FPGA  devices.  It  also  explores 
the  operating  capabilities  and  limitations  for  this  circuit  under  various  resource-constrained  conditions  and  introduces 
algorithms  for  fault  detection  to  make  the  circuit  more  compatible  with  the  space  environment. 


14.  SUBJECT  TERMS  Data  Compression,  Signal  Analysis,  Software  Defined  Radio  (SDR),  System  15.  NUMBER  OE 
Generator,  Fast  Fourier  Transform  (FFT),  Field  Programmable  Gate  Array  (FPGA),  Xilinx,  Virtex™,  PAGES 
Error  Detection,  Parity,  Space-Based  Computing  129 

16.  PRICE  CODE 

17.  SECURITY  18.  SECURITY  19.  SECURITY  20.  LIMITATION  OE 

CLASSIEICATION  OE  CLASSIEICATION  OE  THIS  CLASSIEICATION  OE  ABSTRACT 

REPORT  PAGE  ABSTRACT 

Unclassified  Unclassified  Unclassified  UU 


NSN  7540-01-280-5500 


Standard  Form  298  (Rev.  2-89) 
Prescribed  by  ANSI  Std.  239-18 


1 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


11 


Approved  for  public  release;  distribution  is  unlimited 


A  FIELD  PROGRAMMABLE  GATE  ARRAY  BASED  SOFTWARE  DEFINED 
RADIO  DESIGN  FOR  THE  SPACE  ENVIRONMENT 

Jeremy  V.  Livingston 
Lieutenant,  United  States  Navy 
B.S.,  Rensselaer  Polyteehnie  Institute,  2002 


Submitted  in  partial  fulfillment  of  the 
requirements  for  the  degree  of 


MASTER  OF  SCIENCE  IN  ELECTRICAL  ENGINEERING 


from  the 


NAVAL  POSTGRADUATE  SCHOOL 
December  2009 


Author:  Jeremy  V.  Livingston 


Approved  by:  Frank  E.  Kragh 

Thesis  Advisor 


Hersehel  Loomis 
Co-Advisor 


Jeffrey  Knorr 

Chairman,  Department  of  Eleetrieal  and  Computer  Engineering 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


IV 


ABSTRACT 
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Field  Programmable  Gate  Array  (FPGA),  which  is  chosen  for  its  reprogrammability, 
flexibility,  and  our  ability  to  introduce  fault  tolerance  into  the  design.  Software  design 
tools  allowed  programming  to  be  done  at  a  high  level,  thereby  allowing  more  progress  on 
the  design.  This  thesis  focuses  on  one  such  SDR  that  was  designed  at  a  high  level  of 
abstraction.  This  thesis  documents  an  analysis  of  the  memory  and  timing  requirements  of 
the  circuit  so  that  it  may  be  used  on  resource-constrained  FPGA  devices.  It  also  explores 
the  operating  capabilities  and  limitations  for  this  circuit  under  various  resource- 
constrained  conditions  and  introduces  algorithms  for  fault  detection  to  make  the  circuit 
more  compatible  with  the  space  environment. 


V 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


VI 


TABLE  OF  CONTENTS 


1.  INTRODUCTION . 1 

A.  OBJECTIVES . 1 

B.  DESIGN  APPROACH . 2 

C.  RELATED  WORKS . 2 

D.  THESIS  ORGANIZATION . 3 

IL  DESIGN  TOOLS . 5 

A,  FOURIER  ANALYSIS . 5 

B,  COMPUTING  TOOLS . 7 

1.  System  Generator . 7 

2.  Xilinx  ISE . 8 

3.  Interface . 8 

C,  TARGET  DEVICES . 9 

D,  FOURIER  TRANSFORM  COMPUTING . 10 

1.  Fast  Fourier  Transform  v4,l . 10 

a.  Configuration  Details . 11 

b.  Circuit  Timing . 12 

c.  Resource  Utilization . 15 

2.  Fast  Fourier  Transform  vl.O . 16 

a.  Configuration  Details . 17 

b.  Circuit  Timing . 19 

c.  Resource  Utilization . 22 

d.  Circuit  Limitations . 23 

E,  CONCLUSION . 24 

III.  INITIAL  SDR  DESIGN . 25 

A,  OVERALL  FUNCTIONALITY . 25 

B,  DESIGN  SETUP . 26 

C,  BIN  ENERGY  CALCULATION . 28 

1.  Time  Windowing  Subsystem . 28 

a.  Signal  Flow . 28 

b.  Timing  Analysis . 31 

c.  Memory  Analysis . 34 

2.  Frequency  Windowing  Subsystem . 34 

a.  Timing  Analysis . 35 

b.  Resource  Analysis . 38 

D,  BIN  THRESHOLD  ANALYSIS  AND  DATA  MANAGEMENT . 39 

1.  Timing  Analysis . 40 

2.  Resource  Analysis . 42 

E,  TEMPORARY  STORAGE  AND  OUTPUT  CONTROL . 42 

1.  Temporary  Storage  Subsystem . 42 

2,  Output  Format  Subsystem . 44 

F,  GENERALIZED  CIRCUIT  EXPECTATIONS . 45 

vii 


G.  SUMMARY . 46 

IV.  INITIAL  MODIFICATIONS  TO  THE  ORIGINAL  DESIGN . 47 

A.  INCREASE  COMPRESSION  AND  MEMORY  EFFICIENCY . 47 

1.  Theory . 47 

2.  Changes  Made . 48 

a.  Changes  to  the  Time  Windowing  Subsystem . 48 

b.  Changes  to  Frequency  Windowing  Subsystem . 51 

c.  Changes  to  Temporary  Memory . 51 

d.  Changes  to  Test  Configuration . 53 

3.  Update  to  Circuit  Generalizations . 53 

B.  INTEGRATING  NEW  FFT  IP . 53 

1.  Replacing  FFTv4,l . 54 

2.  Implementation  Issues . 55 

3.  Changes  to  Performance  Expectations . 56 

C.  ADJUSTING  HEADER  FORMAT  AND  DOWNLINK  CONTROL . 58 

1,  Changes  to  the  Header  Format . 58 

2,  Changes  to  Downlink  Control . 61 

3,  Timing  Analysis . 63 

D.  OPTIMAL  MEMORY  CONFIGURATIONS . 63 

1.  Virtex™-nP  Implementation . 64 

2,  Virtex™-!  Implementation . 65 

E.  SUMMARY . 69 

V.  FAULT  DETECTION . 71 

A.  CONSIDERATIONS . 71 

1,  SDR  Considerations . 71 

2,  Parseval’s  Theorem . 72 

3,  Parity  Checking . 74 

B.  MODIFICATIONS  TO  DESIGN . 75 

1.  Error  Checking  Using  Parseval’s  Theorem . 75 

a.  Implementation . 75 

b.  Testing . 79 

2.  Memory  Error  Detection . 80 

a.  Implementation . 81 

b.  Testing . 85 

3.  Resource  Check . 85 

C.  CONCLUSIONS . 86 

VI.  CONCLUSION  . 87 

A.  CONCLUSIONS . 87 

B.  RECOMMENDATIONS . 88 

1.  Bin  Overlap . 88 

2.  Pipelining . 88 

3.  Comprehensive  Test  Set . 89 

4.  Improve  User  Interface . 89 

5.  Explore  Other  Methods  to  Compute  the  FFT . 90 


viii 


APPENDIX  A,  IMPLEMENTATION  DETAILS . 91 

A.  REQUIRED  FILES . 91 

B.  INSTRUCTIONS . 96 

1.  Examine  the  Simulink®  Model . 96 

2.  Conduct  Incremental  Execution  of  the  Test  File . 96 

3.  Synthesis . 97 

C.  CHANGING  PARAMETERS . 97 

APPENDIX  B.  ADDITIONAL  APPLICATIONS . 99 

LIST  OF  REFERENCES . 101 

INITIAL  DISTRIBUTION  LIST . 103 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


X 


LIST  OF  FIGURES 


Figure  1.  Compiling  Instructions  [After  11] . 9 

Figure  2.  FFTv4.1  Implementation  [After  16] . 11 

Figure  3.  FFTv4.1  IP  Block,  N=  8  [From  3] . 12 

Figure  4.  FFTv4.1  response  to  DC  input . 13 

Figure  5.  FFTv4.1  response  to  streaming  DC  input . 14 

Figure  6.  FFTvl.O  Test  Circuit . 17 

Figure  7.  FFTvl.O  System  Generator  Configuration  Options  [After  16] . 18 

Figure  8.  FFTvl.O,  Triple  memory  configuration  [From  22] . 19 

Figure  9.  FFTvl.O  Response  to  DC  input . 21 

Figure  10.  Performance  of  the  valid  signal . 22 

Figure  1 1 .  Conceptual  SDR  Model  [From  3] . 26 

Figure  12.  Overall  Circuit  Design  [From  3] . 27 

Figure  13.  Circuit  to  Calculate  Energy(A:)  [From  3] . 28 

Figure  14.  Multiplier  IP  Configuration  [From  16] . 29 

Figure  15.  Time  Window  Energy  Calculation  [Erom  3] . 30 

Eigure  16.  State  Transition  Diagram  for  pwrjime  Algorithm . 32 

Eigure  17.  Timing  of  Output  for  pwrjime  Algorithm,  N  =  1024  ,M  =  3 . 33 

Eigure  18.  State  Transition  Diagrams  for  Erequency  Analysis  Subsystem . 35 

Eigure  19.  Timing  of  Erequency  Windowing  Subsystem . 37 

Eigure  20.  State  Transition  Diagram  for  outjidr  Algorithm . 41 

Eigure  21 .  State  Transition  Diagram  for  the  Modified  pwrjime  Algorithm . 49 

Eigure  22.  Timing  of  Modified  pwrjime  Algorithm . 50 

Eigure  23.  EETvl.O  as  Used  in  the  SDR . 54 

Eigure  24.  System  Generator  Configuration  for  EETvl.O  [After  16] . 55 

Eigure  25.  EETvl .0  separated  from  Compression  Algorithm . 56 

Eigure  26.  Compression  Algorithm  Separated  from  EET  [After  3] . 57 

Eigure  27.  Initial  SDR  Design  Header  Eormat . 59 

Eigure  28.  Modified  SDR  Header  Eormat . 59 

Eigure  29.  Modified  State  Transition  Diagram  for  outJidr  Algorithm . 60 

E igure  3 0 .  Modified  E ormat  Output  Subsystem  [After  3] . 61 

Eigure  3 1 .  State  Transition  Diagram  for  OutputCd  Algorithm . 62 

Eigure  32.  Partitioned  Compression  Algorithm  for  a  Three-EPGA  Configuration 

[After  3] . 67 

Eigure  33.  EET  Subsystem  Modified  for  Error  Detection  [After  3] . 76 

Eigure  34.  Modification  for  EET  Error  Detection  [After  3] . 77 

Eigure  35.  EET  Error  Detection  Subsystem . 78 

Eigure  36.  State  Transition  Diagram  for  the  ErrorFlagCd  Algorithm . 79 

Eigure  37.  Header  Eormat  with  Error  Code . 79 

Eigure  38.  Error  Injection  Circuit  for  EET  Output . 80 

E igure  39.  3  5  -bit  Parity  Generator . 81 

Eigure  40.  Modification  for  Memory  Error  Detection  (After:  [3]) . 82 

Eigure  41 .  State  Transition  Diagram  for  the  ParityFlagCd  Algorithm . 83 


XI 


Figure  42.  Modification  to  Communicate  Memory  Errors  [After  3] 


84 


xii 


LIST  OF  TABLES 


Table  1.  Development  Software . 9 

Table  2.  FPGA  Memory  [From  12-14] . 10 

Table  3.  FFTv4.1  Resouree  Utilization  on  a  Virtex™-4  [From  20] . 15 

Table  4.  FFTv4.1  Resouree  Utilization  on  a  Virtex™-nP  [From  20] . 16 

Table  5.  FFTvl .0  Resouree  Utilization  on  a  Virtex™-!  [From  20] . 23 

Table  6.  Example  Memory  Expeetation  Using  EETv4.1  with  N  =  1024  and  M  =  3  .  ..64 

Table  7.  Resouree  Estimation  for  SDR  design  [From  20] . 65 

Table  8.  Example  Memory  Configuration  Using  EETvl.O  IP . 66 

Table  9.  Resouree  Estimation  for  Bin  Analysis  [Erom  20] . 68 

Table  10.  Resource  Estimation  for  Temporary  Storage  and  Downlink  Control  [Erom 

20] . 68 

Table  1 1 .  Eormat  of  the  par_code  Signal . 85 

Table  12.  Virtex™-nP  Resources  Required  for  Error  Detection  (Erom;  [20]) . 86 

Table  13.  Important  Sub  Directories  Available  on  DVD . 92 

Table  14.  M-Code  files  External  to  the  SDR  Design . 92 

Table  15.  Simulink®  Model  Eiles  Used  in  this  Thesis  Work . 93 

Table  16.  M-Code  files  Used  in  the  Initial  SDR  Design,  as  Discussed  in  Chapter  III.  ...94 

Table  17.  M-Code  Eiles  Added  or  Adjusted  for  Changes  to  the  SDR  Design . 95 

Tablets.  Storage  Devices  Sensitive  to  Changes  in  A  and  M  (After;  [3]) . 97 


xiii 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


XIV 


LIST  OF  ACRONYMS  AND  ABBREVIATIONS 


BRAM 

Block  Random  Access  Memory 

CLB 

Configurable  Logie  Bloek 

CFTP 

Configurable  Fault-Tolerant  Proeessor 

CRL 

Communieations  Researeh  Laboratory 

DC 

Direet  Current 

DFT 

Diserete  Courier  Transform 

DSP 

Digital  Signal  Proeessing 

DTFT 

Diserete  Time  Courier  Transform 

FIFO 

Lirst-In,  Lirst-Out  Memory 

FFT 

Last  Courier  Transform 

FPGA 

Lield  Programmable  Gate  Array 

FSM 

Linite  State  Maehine 

FT 

Courier  Transform 

GCLK 

Global  Cloek  Buffer 

GUI 

Graphieal  User  Interfaee 

HDL 

Hardware  Deseription  Language 

IDFT 

Inverse  Diserete  Courier  Transform 

IDTFT 

Inverse  Diserete  Time  Courier  Transform 

IF 

Intermediate  Crequeney 

lOB 

Input/Output  Bloek 

IP 

Intelleetual  Property 

ISE 

Integrated  Synthesis  Environment 

LUT 

Look-Up  Table 

XV 


NFS 

Naval  Postgraduate  School 

ORS 

Operationally  Responsive  Space 

RAM 

Random  Access  Memory 

RFD 

Ready  for  Data 

ROI 

Range  of  Interest 

RPR 

Reduced  Precision  Redundancy 

SDR 

Software  Defined  Radio 

SEU 

Single  Event  Elpset 

TMR 

Triple  Modular  Redundancy 

VHDL 

VHSIC  Hardware  Description  Eanguage 

VHSIC 

Very  High  Speed  Integrated  Circuit 

XOR 

Exclusive  Or 

XVI 


EXECUTIVE  SUMMARY 


The  acquisition  of  satellite  systems  is  a  process  that  typically  takes  several  years 
between  the  identification  of  a  need  to  the  delivery  of  a  capability.  Operationally 
Responsive  Space  (ORS)  is  a  strategy  that  strives  to  deliver  space-based  assets  to  the  war 
fighter  in  a  timely  manner.  Two  important  areas  of  concern  are  improving  the  flexibility 
of  satellite  designs  and  streamlining  the  acquisition  process.  The  use  of  space-based 
Field  Programmable  Gate  Arrays  (FPGAs)  can  leverage  both  of  these  key  ideas.  This 
thesis  work  is  intended  to  demonstrate  the  feasibility  of  using  a  space-based  FPGA 
system  as  an  option  for  ORS.  This  is  accomplished  by  implementing  a  tactically  relevant 
Software  Defined  Radio  (SDR)  algorithm  in  a  configuration  suitable  for  the  space 
environment. 

A  FPGA-based  SDR  design  was  developed  through  a  previous  thesis  project 
conducted  by  a  student  working  through  the  Naval  Postgraduate  School  (NPS) 
Communications  Research  Laboratory  (CRL).  The  circuit  computes  the  Fast  Fourier 
Transform  (FFT)  of  a  sampled  real-time  Intermediate  Frequency  (IF)  signal.  The 
complex  FFT  result  is  stored  in  temporary  memory  while  the  energy  in  the  signal  is 
analyzed.  The  circuit  uses  operator-defined  time  windows  and  frequency  Ranges  of 
Interest  (ROI)  to  organize  the  FFT  output  into  time-frequency  bins.  Each  bin  is 
compared  with  a  user-defined  minimum  energy  threshold.  The  circuit  compresses  the 
FFT  output  by  discarding  all  time-frequency  bins  that  do  not  meet  the  minimum 
threshold.  Bins  that  pass  the  threshold  analysis  are  retrieved  from  temporary  memory 
and  forwarded  to  the  circuit’s  output.  The  circuit  is  also  designed  so  that  N,  the  number 
of  points  processed  by  the  FFT,  can  be  adjusted. 

The  initial  SDR  design  was  created  and  tested  at  a  high  level  of  abstraction  using 
System  Generator  software  produced  by  Xilinx.  This  software  interfaces  with  the 
MATLABO/Simulink®  environment.  System  Generator  provides  a  library  of  pre¬ 
designed  Intellectual  Property  (IP)  modules  which  perform  functions  within  a  Simulink® 
model.  The  Simulink®  model  can  be  used  to  generate  files  required  to  program  a  FPGA 


with  the  design.  Tests  were  eondueted  in  the  MATLAB®/Simuhnk®  environment  to 
verify  funetionality  of  the  design,  as  doeumented  in  previous  thesis  work.  The  design 
was  tested  using  a  eonfiguration  with  A  =  8  on  a  VirtexTM-4  FPGA. 

The  first  goal  of  this  researeh  was  to  find  ways  to  implement  the  design  with  a 
high  value  of  N  within  the  resouree  eonstraints  of  seleeted  FPGA  models.  Although  the 
initial  SDR  design  funetions  in  the  MATLAB®/Simulink®  environment,  when  N  is 
inereased  from  A  =  8  to  A  =  1024,  the  design  requires  more  memory  resourees  than  are 
available  on  a  Virtex^M-q  FPGA.  The  high  level  of  abstraetion  used  for  the  initial  SDR 
design  allowed  the  designer  to  bypass  a  detailed  analysis  of  the  eireuit’s  timing  and 
resouree  requirements.  This  led  to  several  ineffieieneies  throughout  the  design  that 
needed  to  be  eorreeted  for  the  eireuit  to  funetion  with  a  high  value  of  A. 

This  thesis  deseribes  an  analysis  of  the  eireuit  elements  produeed  using  System 
Generator  to  better  understand  the  funetion  of  the  design.  IP  modules  used  to  eompute 
the  FFT  were  examined  by  eondueting  a  series  of  tests  in  the  MATLAB®/Simulink® 
environment.  The  results  of  these  tests  verified  that  the  expeeted  output  signals  are 
generated  from  a  series  of  different  input  signals.  The  Xilinx  Integrated  Synthesis 
Environment  (ISE)  Projeet  Navigator  software  was  used  to  eheek  the  EPGA  resourees 
required  for  these  IP  modules.  Post-synthesis  timing  was  verified  for  some  tests  using 
ModelSim®  simulation  software. 

The  PET  IP  module  used  in  the  initial  SDR  design  is  the  EETvd.l  IP  bloek.  The 
data  sheet  for  this  IP  eireuit  design  explains  whieh  target  EPGA  deviees  ean  use  it. 
Eligible  target  deviees  inelude  deviees  in  the  Virtex^^-q  and  Virtex™-nP  family  of 
EPGAs.  The  list  of  eligible  deviees  exeludes  the  Virtex™-!  EPGA  family,  whieh  is  the 
target  deviee  for  several  legaey  spaee-eonfigured  EPGA  systems  ineluding  the  NPS 
Configurable  Eault  Tolerant  Proeessor  (CETP)  experiment.  Eor  this  reason,  a  PET  IP 
module  suitable  for  the  Virtex™-!  was  also  examined.  A  look  at  all  IP  modules  in  the 
System  Generator  library  revealed  that  the  PPTvl.O  IP  bloek  is  the  only  one  that  meets 
this  eonstraint. 


The  FFTvl.O  IP  block  functions  differently  from  the  FFTv4.1  IP  block  in  several 
ways.  The  FFTv4.1  IP  block  samples  the  input  signal  every  clock  cycle,  while  the 
FFTvl.O  IP  block  samples  the  input  signal  once  every  four  clock  cycles.  The  FFTv4.1  IP 
block  can  be  configured  for  full-rate  pipeline  operations  so  that  after  an  initial  latency,  a 
valid  output  signal  is  produced  on  every  clock  cycle.  In  contrast,  the  FFTvl.O  IP  block 
produces  a  single  burst  of  N  valid  output  signals  every  4N  clock  cycles.  Because  the 
initial  SDR  is  configured  to  accept  the  continuous  streaming  output  of  the  FFTv4.1  IP 
block,  the  difference  between  these  FFT  IP  circuits  means  that  changes  to  the  initial  SDR 
design  are  required  if  a  Virtex™-!  FPGA  is  the  desired  target  device. 

In  addition  to  analyzing  the  FFT  IP  modules,  the  control  algorithms  that  govern 
signal  flow  through  the  circuit  were  examined.  Simulations  in  the 
MATLABO/Simulink®  environment  were  used  to  create  state  transition  diagrams 
explaining  the  behavior  of  the  control  algorithms.  The  results  of  this  analysis  were  used 
to  create  expressions  for  the  timing  expectations  in  terms  of  the  size  of  the  FFT  sample 
period  N,  the  number  of  FFT  periods  in  each  time  window  M,  and  the  sum  of  the  sizes  of 
all  user-defined  ROT  Expressions  for  circuit  timing  expectations  can  be  used  to 
determine  how  long  information  needs  to  be  stored  in  memory,  which  determines  the 
minimum  memory  required  for  the  circuit.  This  information  can  be  used  to  create  a 
circuit  configuration  that  maximizes  functionality  for  a  given  set  of  resource  constraints. 

The  second  goal  of  this  research  was  to  make  improvements  to  the  design  that 
increase  downlink  efficiency  by  taking  advantage  of  the  conjugate-symmetric  property 
associated  with  the  FFT  of  real  signals.  Using  this  property,  the  amount  of  FFT  output 
information  required  to  reproduce  a  real  input  signal  can  be  reduced  in  half.  Adjustments 
to  the  memory  allocation  and  control  algorithms  were  made  to  remove  inefficiencies 
discovered  though  the  circuit  analysis  and  to  implement  savings  using  the  FFT  conjugate- 
symmetric  property.  The  downlink  algorithms  were  also  adjusted  to  improve  signal  flow 
to  an  external  communication  system. 

The  resulting  circuit  was  configured  for  a  Virtex™-nP  target  device.  System 
Generator  was  used  to  create  a  Xilinx  ISE  project.  Xilinx  ISE  Project  Navigator  software 
was  used  to  synthesize  the  design,  which  produced  a  resource  estimation  confirming  that 


the  design  would  fit  within  the  resource  constraints  of  the  target  device.  In  similar 
fashion,  the  design  was  configured  for  a  Virtex™-!  target  device  using  the  FFTvl.O  IP 
block.  A  multi-chip  implementation  was  examined,  where  three  Virtex^M-i  FPGAs 
connected  in  series  were  used.  Simulation  in  MATLAB®/Simulink®  and  resource 
estimation  provided  by  synthesis  in  Xilinx  ISE  Project  Navigator  confirmed  that  this  is  a 
feasible  means  of  implementing  the  SDR  design. 

The  final  goal  of  this  research  was  to  add  fault  detection  algorithms  to  make  the 
design  more  suitable  for  the  space  environment.  The  FFT  algorithm  and  temporary 
storage  memory  were  identified  as  the  most  likely  locations  for  faults  within  the  design 
because  of  the  large  percentage  of  FPGA  resources  dedicated  to  each.  The  fault  detection 
and  correction  methods  of  Triple  Modular  Redundancy  (TMR)  and  Reduced  Precision 
Redundancy  (RPR)  were  examined.  These  methods  were  not  used  because  employing 
them  effectively  would  require  more  resources  than  were  available  on  the  FPGA. 

Parseval’s  Theorem  was  used  to  relate  the  energy  of  the  input  signal  to  the  energy 
of  the  output  signal  in  a  way  that  involves  fewer  computations  than  the  FFT.  After  this 
method  was  implemented  in  the  design,  tests  showed  that  the  circuit  was  capable  of 
detecting  FFT  errors.  Parity  checking  was  implemented  to  check  for  errors  in  temporary 
storage  memory.  Tests  showed  that  the  circuit  would  detect  odd-numbered  faults  in  each 
data  word  stored  in  memory.  The  downlink  format  was  adjusted  to  communicate  faults 
detected  in  either  the  FFT  or  memory  to  the  end  user. 

The  desired  end  state  of  this  research  was  an  improved  version  of  the  initial  SDR 
design  with  increased  compression  and  fault  detection  capability  that  can  be  loaded  on  an 
FPGA  configured  for  space.  This  objective  was  met.  The  research  demonstrated  that  a 
tactically  relevant  SDR  design  can  be  configured  for  use  in  the  space  environment. 
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To  further  develop  this  design  toward  use  in  aetual  spaeeeraft,  the  following 
additional  researeh  is  required: 

•  Adjust  the  algorithm  to  eompensate  for  a  eondition  where  the  user  defines 
two  ROIs  that  overlap.  This  currently  leads  to  inefficiency  when 
information  is  transmitted  twice  -  once  for  each  ROT 

•  Use  pipelining  features  available  in  System  Generator  IP  block  interfaces 
to  reduce  the  clock  period.  This  will  require  adjustments  to  some  of  the 
circuit’s  timing  algorithms. 

•  Use  a  larger  range  of  input  signals  and  user-defined  configurations  to  test 
the  design  under  more  stressful  conditions  than  those  imposed  in  tests 
used  for  development.  This  should  include  using  a  broader  range  of  fault 
injection  experiments  to  test  the  sensitivity  of  fault  detection  algorithms. 

•  The  user  interface  should  be  improved  to  prevent  poor  user-defined 
configuration  choices  and  facilitate  the  set-up  process. 

•  Explore  other  FPGA  circuits  to  compute  the  EFT.  This  will  provide  more 
options  and  flexibility  with  ongoing  circuit  development.  An  EFT  design 
using  more  elementary  System  Generator  IP  blocks  could  be  used  to 
implement  internal  fault  detection  and  correction  algorithms. 

The  ever-changing  nature  of  threats  to  United  States  national  interests  generates 
an  increasing  demand  for  flexibility  in  space  architecture.  This  design  is  an  important 
part  of  on-going  research  to  meet  the  demand  through  space-based  FPGA  designs. 
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I.  INTRODUCTION 


The  acquisition  of  satellite  systems  is  a  process  that  typically  takes  several  years 
between  the  identification  of  a  need  to  the  delivery  of  a  capability.  Operationally 
Responsive  Space  (ORS)  is  a  strategy  that  strives  to  deliver  space-based  assets  to  the  war 
fighter  in  a  timely  manner.  Two  important  areas  of  concern  are  improving  the  flexibility 
of  satellite  designs  and  streamlining  the  acquisition  process.  The  use  of  space-based 
Field  Programmable  Gate  Arrays  (FPGAs)  can  leverage  both  of  these  key  ideas.  [1] 

An  FPGA  can  be  reprogrammed  remotely  to  run  different  algorithms.  A  satellite 
with  an  FPGA  could  be  reconfigured  to  meet  a  different  mission  on  orbit,  improving  the 
flexibility  of  the  asset.  This  also  helps  to  streamline  the  acquisition  process  since  FPGA 
design  significantly  reduces  the  amount  of  time  involved  with  integration  and  testing. 
Reprogramming  a  space-based  FPGA  for  a  new  mission  completely  bypasses  the  time, 
cost,  and  risk  involved  with  launching  a  new  satellite.  [2] 

An  FPGA  design  with  potential  for  space  applications  was  presented  in  [3].  This 
initial  SDR  design  is  a  signal  analysis  algorithm  that  compresses  the  output  of  a  Fast 
Fourier  Transform  (FFT)  computation.  The  design  was  created  and  tested  at  a  high  level 
of  abstraction,  using  the  Xilinx  System  Generator  interface  with  the 
MATLAB®/Simulink®  environment.  It  analyzes  the  information  produced  by  the  FFT 
computation  to  determine  if  the  energy  in  user-defined  frequency  Ranges  of  Interest 
(ROI)  meets  user-defined  thresholds.  ROI  that  meet  their  threshold  are  forwarded  to  a 
downlink  algorithm.  ROI  that  do  not  meet  their  threshold  are  discarded.  [3] 

A,  OBJECTIVES 

This  thesis  work  is  intended  to  demonstrate  the  feasibility  of  using  a  space-based 
FPGA  system  as  an  option  for  ORS.  This  is  accomplished  by  implementing  a  tactically 
relevant  SDR  algorithm  in  a  configuration  suitable  for  the  space  environment.  The 
desired  end  state  is  an  improved  version  of  the  design  with  increased  compression  and 
fault  detection  capability  that  can  be  loaded  on  an  FPGA  configured  for  space. 
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This  thesis  work  produces  an  analysis  of  the  hardware  requirements  associated 
with  the  initial  SDR  design.  The  information  gained  through  this  analysis  is  used  to 
make  changes  to  the  design  to  ensure  that  it  makes  the  best  use  of  scarce  FPGA  memory 
resources.  Improvements  are  also  made  to  increase  the  efficiency  of  the  data 
compression. 

This  research  explores  methods  for  improving  the  algorithm’s  fault  detection 
capability.  Fault  detection  is  essential  because  it  enables  ground-based  systems  to  assess 
the  validity  of  the  data  produced.  Fault  detection  is  also  necessary  to  determine  when 
fault  correction  procedures  are  required,  such  as  the  reloading  the  FPGA  configuration. 

B,  DESIGN  APPROACH 

This  thesis  work  begins  with  an  examination  of  the  initial  SDR  design  described 
in  [3].  Simulations  to  examine  the  performance  of  the  design  and  internal  timing  are 
conducted  in  the  MATLAB®/Simulink®  environment.  The  System  Generator  interface 
is  used  to  generate  the  design  in  Very  High  Speed  Integrated  Circuit  (VHSIC)  Hardware 
Description  Language  (VHDL)  format.  Synthesis  of  the  design  from  VHDL  format  is 
conducted  using  the  Xilinx  Integrated  Synthesis  Environment  (ISE)  Project  Navigator. 
Post-synthesis  resource  analysis  is  conducted  in  the  Xilinx  ISE  Project  Navigator 
environment. 

Modifications  to  the  design  discussed  in  Section  B  are  made  incrementally.  After 
each  change,  the  design  is  simulated  using  a  baseline  configuration  and  test  set  described 
in  [3].  The  effectiveness  of  each  modification  is  evaluated  based  on  the  conformance  of 
simulation  results  to  expectations  derived  from  the  baseline  test  set.  Any  changes  to  the 
tests  are  noted  where  appropriate. 

C.  RELATED  WORKS 

This  thesis  work  is  primarily  based  on  the  initial  SDR  design  work  described  in 
[3].  Although  not  specifically  related  to  this  work,  several  organizations  are  developing 
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space-based  FPGA  designs.  One  sueh  design  is  the  Programmable  Satellite  Transeeiver 
presented  in  [4],  Wright  deseribes  other  areas  of  researeh  in  signals  analysis  and  the  use 
of  System  Generator  for  FPGA  design  [3]. 

The  work  done  to  provide  a  fault  deteetion  eapability  in  this  design  is  elosely 
related  to  the  work  deseribed  in  [5]  and  [6].  Snodgrass  presents  a  new  method  of  error 
deteetion  and  eorreetion  ealled  Redueed  Preeision  Redundaney  (RPR)  [5].  This  eoneept 
is  diseussed  further  in  [7].  Sullivan  expands  on  the  work  in  [5]  by  examining  the 
applieation  of  RPR  to  Digital  Signal  Proeessing  and  spaeeeraft  attitude  eontrol  [6].  This 
researeh,  also  deseribed  in  [8],  explores  the  feasibility  of  RPR  for  elementary  algorithms 
then  expands  the  seope  to  multi-level  algorithms.  This  ineludes  a  study  of  using  RPR  to 
eorreet  errors  in  the  eomputation  of  the  Fast  Fourier  Transform  algorithm. 

D,  THESIS  ORGANIZATION 

This  introduetory  ehapter  provides  some  baekground  information  and  explains  the 
objeetives  of  the  thesis  work.  The  design  approaeh  is  deseribed  and  the  thesis  work  is 
plaeed  in  the  eontext  of  other  related  researeh. 

Chapter  II,  Design  Tools,  diseusses  the  resourees  that  were  used  for  the  design. 
The  ehapter  presents  basie  information  about  the  FFT  algorithm.  The  hardware  and 
software  tools  seleeted  for  the  design  are  listed.  The  ehapter  eoneludes  with  an  analysis 
of  the  LogiCore  FFTvl.O  and  Xilinx  FFTv4  Intelleetual  Property  (IP)  that  were  used  to 
implement  the  FFT  algorithm  in  hardware  on  Xilinx  FPGAs. 

Chapter  III,  Initial  SDR  Design,  examines  a  SDR  design  that  was  introdueed  in 
[3].  The  ehapter  presents  graphieal  representations  of  eertain  design  elements  to  improve 
understanding  of  their  funetionality.  The  ehapter  introduees  an  analysis  of  the  design’s 
timing  and  resouree  requirements.  The  ehapter  eoneludes  by  presenting  generalized 
expressions  for  the  eireuit’s  lateney  and  memory  requirements  based  on  the  eireuit’s 
eonfiguration  and  user-defined  eonstraints. 

Chapter  IV,  Initial  Modifieations  to  the  Original  Design,  presents  ehanges  to  the 
initial  SDR  design  that  make  it  more  feasible  for  use  on  a  resouree-eonstrained  FPGA. 
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The  conjugate  symmetry  property  of  Fourier  Transforms  is  used  to  increase  downlink  and 
memory  efficiency.  The  header  generation  algorithm  and  Format  Output  subsystem  are 
modified  to  improve  efficiency  and  the  ability  to  send  signals  to  an  external 
communications  system.  Modifications  are  made  to  accommodate  the  FFTvl.O  IP, 
enabling  use  on  a  Virtex™-!  FPGA.  The  effectiveness  of  these  changes  is  demonstrated 
by  synthesizing  the  design  for  both  a  Virtex^M-i  FPGA  and  a  Virtex^M-HP  FPGA, 
producing  an  estimate  of  resource  utilization  for  each. 

Chapter  V,  Fault  Detection,  introduces  methods  that  would  enable  the  design  to 
recognize  when  errors  occur  in  either  FFT  computing  or  in  temporary  memory  storage. 
After  reviewing  the  feasibility  of  various  redundant  algorithms,  a  method  using 
Parseval’s  theorem  to  compare  the  FFT  input  with  the  FFT  output  is  explained.  The 
Parseval-related  method  is  demonstrated  as  a  resource-efficient  means  of  detecting  errors 
in  the  FFT  computation.  Parity  checking  is  demonstrated  as  a  means  to  detect  errors  in 
temporary  memory  storage. 

Chapter  VI,  Conclusions  discusses  the  significance  of  the  information  presented 
in  this  thesis.  The  chapter  also  presents  recommendations  for  future  work. 
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II.  DESIGN  TOOLS 


Implementing  and  analyzing  the  SDR  requires  the  use  and  understanding  of 
eertain  mathematieal  concepts  and  design  tools.  This  chapter  presents  some  basic 
information  regarding  Fourier  analysis  and  FPGA  design  using  the  Xilinx  System 
Generator  tool.  It  also  discusses  some  design  considerations  regarding  the  specific 
hardware  and  IP  circuitry  used.  This  chapter  shows  how  the  theory  presented  is 
implemented  in  hardware. 

A.  FOURIER  ANALYSIS 


The  use  Fourier  analysis  to  examine  the  characteristics  of  signals  is  discussed  in 
[3].  This  section  reviews  the  process  and  adds  more  information  regarding  the  efficient 
implementation  of  Fourier  analysis  using  FFTs,  as  discussed  in  [9]. 


The  Fourier  Transform  (FT)  is  used  to  analyze  signals  in  the  frequency  domain. 
As  discussed  in  [9],  the  frequency  domain  representation  of  the  time  domain  signal  x{t) 
is 


+00 

X{f)  =  FT{x(0}  =  j  . 


(Ill.l) 


The  frequency  domain  representation  of  the  signal  can  be  converted  back  to  the  time 
domain  using  the  Inverse  Fourier  Transform  (IFT), 


+00 

x(t)  =  IFT{X(/)}  =  J  . 


—00 


(111.2) 


As  shown  in  Equations  II.  1  and  II.2,  the  Fourier  transform  and  inverse  Fourier 
transform  are  dependent  on  an  infinite  set  of  continuous  samples.  The  Discrete  Time 
Fourier  Transform  (DTFT),  which  provides  a  means  for  calculating  the  frequency 
representation  of  a  discrete  time  sequence  of  infinite  length  is  also  discussed  in  [9].  The 
variable  t  is  replaced  with  the  variable  n,  indicating  that  the  points  are  sampled  discretely. 
The  variables  are  related  by  the  expression  t  =  nT  where  T  is  the  sampling  period  and  n 
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is  required  to  be  an  integer.  The  sampled  signal  is  x\n\  =  x(nT)  .  The  variable  n  is  used 
in  the  expression 


X(f)  =  TnVj{x[n\}  =  Y,x[n\ 


-jlnf'n 


(III.3) 


In  this  expression,  f  is  the  frequeney  in  units  of  eyeles  per  time  index.  For  frequeneies 
/  e  [-0.5  /^,0.5  /J  ,  X{f)  =  X{f'  f^)  if  there  is  no  aliasing  and  the  sampling  frequeney 


is  f^=\n .  Similarly,  the  Inverse  Diserete  Time  Fourier  Transform  (IDTFT)  provides 


the  diserete  time  sequenee  assoeiated  with  the  frequeney  domain  representation,  as  shown 
in  the  expression 

+1/2 

x[n\  =  IDTFT{X(/')}  =  j  X(/ 

-1/2 

As  with  the  Fourier  transform,  the  DTFT  requires  the  analysis  of  a  signal  over  all 
time.  The  IDTFT  requires  the  analysis  of  an  infinite  number  of  frequeneies.  As 
diseussed  in  [9],  the  frequeney  domain  representation  of  a  time  domain  signal  ean  be 
estimated  from  a  finite  set  of  samples  of  length  N  using  the  diserete  Fourier  transform  as 
shown  in  the  expression 

x[k]  =  DFT  {x  [  w]}  =  X  ^  [w]  ,  for  ^  =  0, . .  .iV  - 1  (ni.5) 

n=0 


In  this  expression,  k  =  f'N.  The  variable  k  is  related  to  the  eontinuous  frequeney / as 
shown  in  the  expression 


(III.6) 


The  inverse  diserete  Fourier  transform  reverses  the  proeess,  produeing  the  time  domain 
signal  of  length  N  from  the  frequeney  domain  signal  defined  on  the  set  of  N  diserete 
frequeneies,  as  shown  in  the  expression 

x[n\  =  IDFT  { V  [/:]}  =  F  ^  X  [A:]  ,  for  «  =  0, ...  W  - 1 . 

A  k=Q 
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To  simplify  these  expressions  the  phase  faetor  is  used.  This  element  is  also 


sometimes  referred  to  as  a  “twiddle  factor,”  defined  by  the  expression 

_  -jlniN 


w 


N 


(III.8) 


The  straightforward  DFT  calculation  for  each  element  of  the  discrete  frequency 
vector  X\k^  requires  N  complex  multiplication  and  addition  operations.  The 

computation  of  the  entire  set  requires  operations.  The  FFT  is  a  means  of  calculating 
the  same  result  by  breaking  up  the  expression  in  a  way  that  uses  fewer  operations. 

The  radix-2  FFT  implementation  divides  the  DFT  expression  into  its  even  and 
odd  components,  as  shown  in  the  equation 

7V/2-1  7V/2-1 


=  X  ^  x[2m  +  l] 

m=0  m=0 

The  DFT  calculation  can  be  conducted  in  A^logj  N  operations.  [9] 


W 


2mk 


N 


(111.9) 


B,  COMPUTING  TOOLS 

Several  computational  tools  were  used  to  design,  test,  and  implement  the  SDR  in 
hardware.  Wright  discusses  the  FPGA  design  process  in  detail  [3].  This  section 
discusses  the  configuration  of  these  tools,  highlighting  specific  details  critical  to 
reproducing  results  described  in  the  remainder  of  the  thesis. 

1,  System  Generator 

As  discussed  in  [10],  System  Generator  is  a  hardware  design  tool  produced  by 
Xilinx.  It  produces  Intellectual  Property  (IP)  based  pre-designed  circuitry  in  a  format  that 
can  be  inserted  in  a  signal  flow  path  in  the  MATLAB®/Simulink®  environment.  This  IP 
circuitry  is  made  available  through  the  Xilinx  ISE  software  package.  Circuit  designers 
who  wish  to  use  System  Generator  are  encouraged  to  use  the  training  package  available 
with  the  software.  The  package  is  labeled  as  a  series  of  labs  (1-7),  located  in  the 
following  path: 

\Xilinx\  1 0 .  1\D  SP_T  ools\sysgen\examples\getting_started_training 
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The  MATLABO/Simulink®  environment  offers  several  advantages  when 
designing  a  eireuit  to  be  used  for  signal  analysis.  The  eomputation  tools  in  MATLAB® 
provide  a  means  of  easily  generating  input  signals  and  interpreting  output.  The  eireuit 
diagram  is  easy  to  eonstruet  and  visualize.  Using  the  System  Generator  interfaee,  the 
design  ean  be  eompiled  to  any  level  from  the  Hardware  Deseription  Language  (HDL) 
netlist  down  to  the  bitstream  file  required  to  program  the  target  FPGA. 

One  disadvantage  of  designing  in  System  Generator  is  that  the  extra  layer  of 
abstraetion  from  the  IP  bloeks  disables  some  of  the  eonfiguration  options  that  would 
otherwise  be  available.  System  Generator  is  not  a  standard  tool  used  in  eireuit  design  at 
this  time,  so  many  IP  bloeks  are  not  available  to  be  implemented  in  this  environment. 
The  way  to  work  around  these  shorteomings  is  to  use  System  Generator  for  high-level 
design  and  testing,  and  then  eompile  the  eireuit  to  the  HDL  netlist  level.  From  there, 
other  design  tools  ean  be  used  to  eonfigure  the  eireuit  and  design  interfaees  to  other  IP 
bloeks. 


2,  Xilinx  ISE 

The  Xilinx  ISE  Design  Suite  is  another  tool  that  was  used  in  this  design  to 
eonfigure  the  HDL  netlist  produeed  by  System  Generator.  The  Xilinx  ISE  interfaeed 
with  the  ModelSim®  simulator  to  eonfirm  the  results  of  tests  that  were  eondueted  in  the 
MATEAB®/Simulink®  environment. 

3.  Interface 

In  order  for  the  design  environment  to  funetion,  all  software  listed  in  this  seetion 
must  work  together.  The  versions  of  eaeh  pieee  of  software  used  for  this  design  are 
shown  in  Table  1. 
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Software 

Version 

Purpose 

M  ATF  AB  ®/ S  imulink® 

7.4.0  (R2007a) 

Configure  the  design. 

Generate  input.  Interpret  output. 

Xilinx  ISE 

Xilinx  System  Generator 

10.1 

Define  the  design  structure  and 

configure  components. 

ModelSim® 

SE  6.3g 

Simulate  the  design 

Table  1.  Development  Software. 


After  installing  all  components,  the  Xilinx  libraries  must  be  compiled  for 
ModelSim®.  This  can  be  accomplished  by  entering  the  compxlib  command  in  the 
prompt  available  in  Xilinx  ISE  Project  Navigator  under  the  “TCL  Shell”  tab.  The 
command  must  be  entered  with  the  parameters  shown  in  Figure  1.  Although  divided 
between  two  lines  for  easier  reading,  the  command  is  entered  without  a  carriage  return 
until  the  end. 

IB  ^  -tnci_se  -airch  all  -lib  all  -1  all  -dit:  C:\Xilinx92i 

-log  compxlib.log  -w  -p  C:\Hodeltech_6.3g\win32  -smartit)odel_setupj 

Figure  1.  Compiling  Instructions  [After  11]. 

C.  TARGET  DEVICES 

This  design  is  intended  for  the  Xilinx  Virtex™  FPGA  family.  This  section 
presents  the  capabilities  of  three  devices  within  that  family,  specifically  focusing  on  the 
available  block  memory.  Memory  is  a  critical  constraint  for  the  SDR  design,  which  can 
be  tailored  for  any  of  these  devices.  The  design’s  capabilities  and  limitations  change, 
depending  on  the  target  device.  This  will  be  discussed  in  greater  detail  later  in  this 
document. 

Table  2  illustrates  the  memory  capacities  for  three  different  Xilinx  FPGA  devices. 
The  Virtex™-!  device  was  selected  as  a  baseline  FPGA  used  in  several  legacy  space 
systems.  The  Virtex™-!!  Pro  device  was  selected  for  comparison  because  it  is  the 
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earliest  model  that  is  eapable  of  supporting  the  FFTv4.1  IP  eireuit.  The  Virtex™-4 
deviee  was  seleeted  beeause  it  was  a  readily  available  target  deviee  used  for  the  initial 
SDR  design. 


FPGA  Series 

Deviee 

Bloek  RAM 
(BRAM) 

Bloeks 

Deseription 

Virtex™-! 

xevIOOO 

16  kB 

32 

512-Byte  Bloeks 

Virtex™-!!  Pro 

xe2vp20 

1584  kB 

88 

1 8  kB  Bloeks 

Virtex™-4 

xe4vlx25 

1296  kB 

72 

1 8  kB  Bloeks 

Table  2.  FPGA  Memory  [From  12-14], 


D,  FOURIER  TRANSFORM  COMPUTING 

The  System  Generator  toolbox  provides  several  IP  bloeks  that  eompute  the  FFT. 
Two  of  these  bloeks  were  seleeted  as  feasible  eandidates  for  eomputing  the  FFT  for  this 
design;  Fast  Fourier  Transform  vl.O  (FFTvl.O)  and  Fast  Fourier  Transform  v4.1 
(FFTv4.1).  This  seetion  diseusses  the  advantages  and  disadvantages  of  eaeh  IP  bloek, 
and  demonstrates  the  funetionality  of  eaeh  as  applied  to  the  design. 

1,  Fast  Fourier  Transform  v4,l 

Within  the  Virtex™  FPGA  family,  FFTv4.1  is  designed  to  work  with  the 
Virtex™-5^  Virtex™-4,  and  Virtex™-!!  Pro.  It  provides  the  eapability  for  pipelined, 
streaming  I/O,  whieh  permits  the  eontinuous  proeessing  of  data.  For  this  reason,  the 
pipelined  eonfiguration  is  preferred  if  using  the  FFTv4.1  with  this  SDR  design.  The 
FFTv4.1  ean  be  eonfigured  to  eompute  an  FFT  of  any  length  A  for  8  <  A  <  65536  where 
Ais  a  power  of  two  [15]. 
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a. 


Configuration  Details 


Figure  2.  FFTv4.1  Implementation  [After  16]. 

Figure  2  shows  the  configuration  options  for  FFTv4.1  and  displays  those 
options  that  were  selected.  The  circuit  produces  natural  order  output,  which  costs 
additional  circuit  resources  and  delay  over  the  bit  reversed  output.  Wright  describes  the 
input  and  output  signals  of  this  circuit  in  detail  [3]. 
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Figure  3.  FFTv4.1  IP  Block,  N=S  [From  3]. 

The  crucial  details  of  this  circuit  are  its  input  format,  output  format,  and 
timing.  The  input  signal  must  already  be  separated  into  its  real  and  imaginary 
components.  For  the  purposes  of  this  design,  all  signals  are  assumed  to  be  real,  so  the 
imaginary  component  is  hard-wired  to  binary  0.  The  input  data  must  be  in  fixed  point 
format,  with  the  binary  point  at  D-\  for  a  data  word  of  width  D.  The  output  of  the 
circuit  has  a  data  word  width  of  Z)  +  logj  + 1  bits,  which  prevents  overflow  while 
maintaining  the  fixed  binary  point  in  the  same  location.  As  discussed  in  [15],  the  input 
data  must  be  scaled  such  that|x[n]|  <  1 .  This  configuration  uses  scaling  modules  outside 
the  FFTv4.1  circuit  to  reduce  the  output  by  a  factor  of  1  /  A ,  as  shown  in  Figure  3. 

b.  Circuit  Timing 

As  discussed  in  [3]  and  [15],  the  input  signal  ‘start’  indicates  that  the 
FFTv4.1  circuit  should  begin  accepting  input.  If  start  is  asserted  at  time  t  =  0 ,  the  FFT 
signals  its  readiness  for  input  by  asserting  the  ‘ready  ’  flag  and  xnjndex  begins  counting 
from  0  to  N -I,  incrementing  on  each  clock.  When  t  =  4,  the  FFTv4  circuit  accepts  the 
first  input,  x[0].  At  t  =  1025  ,  xnjndex  restarts  at  zero  and  at  t  =  1028  the  circuit  accepts 
the  first  sample  of  the  next  set  of  input  data.  There  is  no  delay  between  the  input  of 
sequential  data  sets. 


The  signal  e_done  is  asserted  at  t  =  2160,  indicating  that  the  circuit  will 
produce  the  first  X\k\  output  on  the  next  clock  cycle.  At  t  =  2161  the  signal  done  is 
asserted,  xk_re  and  xk_im  indicate  the  real  and  imaginary  portions  of  A[0]  respectively, 
and  xkjndex  begins  counting  from  0  to  N  -I,  incrementing  on  each  clock. 


Figure  4.  FFTv4.1  response  to  DC  input. 


The  number  of  clock  cycles  between  the  last  input  point  and  the  first 
output  can  be  expressed  as  the  latency  Z  =  A  +  8  clock  cycles.  There  is  no  delay 
between  the  outputs  of  sequential  data  sets.  The  timing  of  the  FFTv4.1  circuit  in 
response  to  a  real.  Direct  Current  (DC)  signal  (x[n]  =  0.5Vn)  is  illustrated  in  Figure  4. 
The  output  displayed  has  not  yet  been  rescaled,  so  the  desired  result  is  shown  in  Equation 
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(III.  10).  In  this  example,  the  start  signal  was  only  set  for  one  clock  cycle.  After 
computing  the  first  set  of  N points,  the  circuit  stops  accepting  input. 


N-\ 

X[0]  =  ^x[n]e°  =Ax[n]  =  1024-0.5  =512 

«=0 


(III.  10) 


A[A:]  =  0  fork^^O 

The  test  was  re-run  with  the  start  signal  set  high  for  4  <  t  <  IN  -f  4 ,  with 
the  results  shown  in  Figure  5.  This  illustrates  the  circuit’s  ability  to  accept  streaming 
input  and  produce  streaming  output. 


Figure  5.  FFTv4.1  response  to  streaming  DC  input. 
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c.  Resource  Utilization 

Table  3  illustrates  the  resources  that  the  FFTv4.1  uses  when  the  target 
device  is  the  Virtex™-4  FPGA  and  the  configuration  options  are  selected  with  N  =  1024 , 
as  shown  in  Figure  2.  This  confirms  the  estimation  provided  in  [15].  The  specific  FPGA 
selected  for  this  synthesis  was  an  xc4vlx25-10sf363  model.  The  resource  estimation  was 
produced  by  the  Xilinx  ISE  Project  Navigator  following  synthesis.  As  discussed  in  [17], 
two  slices  are  used  to  make  a  single  Configurable  Logic  Block  (CLB)  on  an  FPGA. 
Four-input  Look-Up  Tables  (LUTs)  are  function  generators  capable  of  implementing  any 
Boolean  function  of  four  inputs.  “Bonded  lOB”  indicates  the  number  of  Input/Output 
Blocks  that  were  used.  As  discussed  in  [18],  “LIL016/RAMB16”  indicates  the  number 
of  18kB  Random  Access  Memory  (RAM)  blocks  that  were  used.  “DSP48”  indicates  an 
arithmetic  primitive  used  for  digital  signal  processing  that  consists  of  an  18-bit  by  18-bit 
multiplier  followed  by  a  three-input  adder.  As  discussed  in  [19],  “GCLK”  indicates  the 
number  of  global  clock  buffers  that  were  used. 


Resource 

Used 

Available 

Percent  Used 

Slices 

4950 

10752 

46% 

Llip  Llops 

8654 

21504 

40% 

4  input  LUTs 

6606 

21504 

30% 

Bonded  lOBs 

132 

240 

55% 

LIL016/RAMB16 

20 

72 

27% 

GCLK 

1 

32 

3% 

DSP48 

48 

48 

100% 

Table  3.  LLTv4.1  Resource  Utilization  on  a  Virtex™-4  [Lrom  20]. 


Table  4  illustrates  the  resources  that  the  LLTv4.1  uses  when  the  target 
device  is  the  Virtex™-nP  LPGA  and  the  configuration  options  are  selected  with 
N  =  1024 ,  as  shown  in  Ligure  2.  This  information  confirms  the  estimation  provided  in 
[15].  As  discussed  in  [21],  “MULT  18X18”  indicates  the  number  of  18-bit  by  18-bit 
multiplier  primitives  available.  As  discussed  in  [15],  the  model  must  have  at  least  as 
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many  resources  as  those  available  in  the  xc2vp20  series.  The  specific  FPGA  selected  for 
this  synthesis  was  an  xc2vpx20-51T896  model.  In  this  configuration,  the  phase  factor  bit 
width  cannot  be  greater  than  16. 


Resource 

Used 

Available 

Percent  Used 

Slices 

3734 

9792 

38% 

Flip  Flops 

6459 

19584 

32% 

4  input  LUTs 

4592 

19584 

23% 

Bonded  lOBs 

115 

552 

20% 

BRAM 

16 

88 

18% 

MULT  18X18 

22 

88 

25% 

GCLK 

1 

16 

6% 

Table  4.  FFTv4.1  Resource  Utilization  on  a  Virtex™-IIP  [From  20]. 


2,  Fast  Fourier  Transform  vl.O 

From  the  family  of  Virtex™  FPGA  devices  the  FFTvl.O  is  designed  to  work  only 
on  the  Virtex™-!.  Although  it  does  not  provide  the  capability  for  a  pipelined  architecture 
with  streaming  output,  it  does  provide  the  capability  to  sample  continuously.  When  using 
the  triple  memory  configuration,  the  circuit  compensates  for  a  latency  of  3N  for  each 
computation  by  sampling  at  one  quarter  of  the  clock  rate  This  is  true  for 

any  value  of  N.  [22] 

As  discussed  in  [22],  the  FFTvl.O  is  limited  to  16-bit  arithmetic.  As  with 
FFTv4.1,  the  input  data  must  be  scaled  such  that|x[n]|  <  1 .  Unlike  the  FFTv4.0,  the  word 

format  of  the  input  propagates  to  the  output.  To  prevent  overflow,  the  output  is 
automatically  scaled  by  1/A  [22].  The  circuit  operation  differs  slightly  between  the 
features  offered  by  the  Xilinx  FFTvl.O  core  and  its  realization  in  the  System  Generator 
environment.  These  differences  are  noted  in  the  following  description  where  appropriate. 
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a.  Configuration  Details 

As  shown  through  a  review  of  [16]  and  [22],  there  are  differences  between 
the  HDL  implementation  of  the  FFTvl.O  IP  block  and  its  representation  in  the  System 
Generator  environment.  Although  FFTvl.O  provides  the  same  output  signals  available 
from  FFTv4.1,  some  of  them  are  not  available  in  the  System  Generator  representation  of 
the  circuit.  The  signals  e_done,  xn_index,  xk_index,  and  busy  are  not  listed  as  available 
outputs,  as  illustrated  in  Figure  6.  Their  absence  is  not  a  major  obstacle  to  development, 
since  the  signals  used  for  this  SDR  design  can  be  replicated  using  other  features.  By 
inserting  one-clock  delays  in  all  other  signal  paths,  the  done  signal  can  function  as  the 
e  done  signal.  The  indices  can  be  replaced  by  counters  when  synchronized  with  the 
valid  and  done  signals. 


Figured.  FFTvl.O  Test  Circuit. 


The  FFTvl.O  circuit  offers  fewer  configuration  options  in  the  System 
Generator  Graphical  User  Interface  (GUI).  The  System  Generator  user  can  only  select  a 
FFT  length  N  such  that  N  e  {16,64,256,1024}  ,  as  shown  in  Figure  7.  This  suggests  that 

the  IP  block  uses  radix-4  computations  because  each  allowable  size  of  A  is  a  power  of 
four.  The  “Memory  Usage”  option  indicates  how  many  N  length  buffers  are  used  to 
manage  the  data  flow  within  the  design. 
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Figure  7.  FFTvl.O  System  Generator  Configuration  Options  [After  16]. 

An  abstract  model  of  the  “Triple  Memory”  configuration  is  shown  in 
Figure  8.  This  architecture  uses  one  buffer  to  store  intermediate  results  during  FFT 
computation.  An  output  buffer  allows  the  previous  computation  to  be  stored  during 
output  while  the  current  one  is  being  processed.  An  input  buffer  allows  a  third  set  of 
points  to  be  sampled  during  the  computation. 
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Figure  8.  FFTvl.O,  Triple  memory  configuration  [From  22], 

b.  Circuit  Timing 

The  FFTvl.O  circuit  begins  processing  inputs  when  the  valid  signal  is 
asserted.  After  the  processing  begins,  a  new  set  of  1024  (A)  input  points  is  sampled 
every  4096  clock  cycles  ( 4 A  ).  Although  not  used  in  this  design,  a  synchronous  reset 
port  is  available  to  stop  the  FFT  processing.  [22] 

If  the  valid  signal  is  asserted  at  t  =  1 ,  the  done  signal  will  be  asserted  at 
t  =  8246 .  When  done  is  asserted,  the  output  signals  Xk_r  and  Xk_i  equal  the  real  and 
imaginary  components  of  X[0] .  The  output  sequences  from  X[0]  to  X[1023]  over  1024 
clock  cycles.  During  this  time,  the  vout  signal  is  asserted  to  indicate  that  the  output  is 
valid.  On  the  next  clock  after  X[1023],  the  vout  signal  goes  to  zero  for  the  next  3072 
clock  cycles,  as  the  circuit  computes  the  next  FFT.  The  first  point  of  the  next  data  set  is 
output  at  time  12342,  4096  clock  cycles  after  the  first  point  of  the  previous  set.  [22] 
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The  FFTvl.O  circuit  timing  is  demonstrated  in  a  test  run  on  a  DC  input 
signal.  The  valid  input  signal  was  asserted  with  ReIn  =  Q.5  at  time  t  =  4.  No  input 
signals  were  changed  for  the  duration  of  the  test.  The  test  results  are  shown  in  Figure  9. 
The  figure  illustrates  that  circuit  produced  the  expected  output, 


:^  =  l|;x[n]e°=x[n]  =  0.5 
N  Nh  ^  ^  ^  ^ 


N 


0  for  A:  0 


(III.  11) 


As  anticipated,  after  the  initial  delay  the  output  is  valid  for  the  first  1024 
clock  cycles  out  of  every  4096.  The  Ready  For  Data  (RFD)  signal  is  asserted  for  all 
time.  This  indicates  that  the  FFTvl.O  circuit  is  continuously  sampling.  [22] 


After  its  first  assertion,  the  valid  signal  indicates  whether  the  sampled 
inputs  are  valid.  De-asserting  the  valid  signal  indicates  that  the  sampled  input  is  invalid. 
If  any  portion  of  the  sampled  input  is  accompanied  by  an  invalid  signal,  the  FFTvl.O 
circuit  considers  the  entire  input  set  to  be  invalid.  In  the  corresponding  output  sequence, 
vout  =  0  .  In  the  ISE/ModelSim®  environment  invalid  output  is  displayed  as,  X\k~\  =  0 
for  all  k.  In  the  MATLAB®/Simulink®/System  Generator  environment,  invalid  output  is 
displayed  as  NaN,  indicating  that  the  value  cannot  be  computed  as  a  number. 
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The  performance  of  the  valid  input  signal  is  demonstrated  in  Figure  10. 
The  input  is  a  real  DC  signal,  where  x[«]  =  0.5  .  The  input  valid  signal  is  asserted  at 

t  =  4 ,  initiating  FFT  sampling  and  processing.  After  one  clock  cycle,  the  input  valid 
signal  returns  to  zero.  Since  the  FFT  detects  that  the  input  is  invalid,  the  corresponding 
output,  indicated  by  the  done  signal  asserted  at  t  =  8246  is  invalid,  with  vout  =  0.  In  the 
next  input  sequence  starting  at  t  =  5000 ,  valid  is  asserted  for  all  samples  except  for  one  at 
t  =  6144.  Because  one  input  point  was  invalid,  the  corresponding  output,  starting  at 
t  =  12342  is  invalid.  The  input  valid  signal  is  asserted  for  the  remainder  of  the  test,  and 
the  first  valid  output  is  produced  at  t  =  16438  . 


FFTvl  Timing  Signals 


o 

Valid 

+ 

Done  +  -2 

• 

RFD  +  4 

XIO 


Figure  9.  FFTvl. 0  Response  to  DC  input. 
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Figure  10.  Performance  of  the  valid  signal. 


c.  Resource  Utilization 

Using  the  Triple  Memory  configuration  discussed  in  this  section,  the 
FFTvl.O  circuit  uses  the  resources  shown  in  Table  5.  The  specific  FPGA  selected  for  this 
synthesis  was  an  xcvl000-4bg560  model.  The  resource  estimation  was  produced  by  the 
Xilinx  ISE  Project  Navigator  following  synthesis.  The  FFTvl.O  IP  was  configured  with 
N  =  1024 ,  as  shown  in  Figure  7. 
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Resource 


Used 


Available 


Percent  Used 


Slices 

1289 

12288 

10% 

Flip  Flops 

2577 

24576 

10% 

4  input  LUTs 

2245 

24576 

9% 

Bonded  lOBs 

70 

404 

17% 

BRAM 

24 

32 

75% 

GCLK 


Table  5.  FFTvl .0  Resource  Utilization  on  a  Virtex™-!  [From  20]. 


d.  Circuit  Limitations 


Designers  using  the  FFTvl. 0  circuit  need  to  be  conscious  of  the 
limitations  on  precision  imposed  by  the  combination  of  16-bit  computations,  propagating 
the  input  signal  format  to  the  output,  and  the  scaling  of  the  output  signal  by  UN. 

If  a  maximum-precision  output  is  desired,  its  format  must  be  a  16-bit 
number  with  the  radix  point  at  15.  Using  this  format,  the  smallest  detectable  output  is 

0.00000000000000 12  =(l/2'')^^  «  3.05 13x10  ^  (III.  12) 

The  input  must  be  in  the  same  format,  so  the  smallest  possible  input  signal  is  the  same 
size  if  the  full  dynamic  range  of  the  input  signal  is  used.  Suppose  an  impulse  signal  is 
input  to  the  FFTvl. 0  circuit,  where 


x[n\  =  O.OOOOOlj  =  2jo  =  0.0156  for  n  =  0 


x[n]  =  0  for  n  1...  A-1 
In  this  case,  the  corresponding  output  should  be 


(III.  13) 
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x[k] 

N 

xm 

N 


1  TV-l  1 

^  'V  ■’  r  1  ^-jlTrkn! N  ^ 

~  N 


«=0 

0.0156 
1024 


N-\ 


0.0156e°  +  ^0xe 


jlTtknlN 


=  (l  .5234  X 10^'  =  O.OOOOOOOOOOOOOOOI2 


(III.  14) 


This  is  smaller  than  the  minimum  detectable  output  signal,  so  the  output  would  appear  to 
beX[A:]  =  OVA: . 


In  this  configuration,  if  it  is  desired  that  all  signals  propagate  through  the 
circuit  to  produce  valid  output,  then  the  input  signal  is  limited  to  6  bits  with  5  bits 
representing  the  fraction.  In  this  case,  the  smallest  possible  input  value  is  2^^  =0.313. 
The  user  may  decide  to  use  a  larger  range  than  this,  but  must  be  aware  of  the  fact  that  not 
all  signals  will  propagate  through  the  circuit  to  produce  a  valid  output. 


E,  CONCLUSION 


This  chapter  discussed  the  computing  resources  required  to  implement  the  SDR 
design.  The  mathematical  basis  for  the  FFT  was  discussed.  Software  tools  were 
introduced,  as  well  as  target  FPGA  devices.  The  chapter  demonstrated  the  function  of  IP 
circuitry  available  to  compute  the  FFT.  The  next  chapter  discusses  the  memory 
requirements  and  timing  details  of  the  initial  SDR  design. 
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III.  INITIAL  SDR  DESIGN 


This  chapter  reviews  and  augments  the  work  doeumented  in  [3].  The  refereneed 
NFS  thesis  illustrates  the  performanee  of  an  SDR  eompression  algorithm  under  various 
test  seenarios.  These  demonstrations  foeused  on  the  eireuit’s  output  as  a  funetion  of  the 
input.  While  this  information  is  useful  to  the  eircuit’s  end  user,  it  laeks  a  level  of  detail 
needed  for  future  development  of  the  eireuit.  This  ehapter  traees  the  design’s  signal  path, 
highlighting  signals  and  eonfigurations  eritieal  to  the  eircuit’s  function.  It  also  illustrates 
the  intermediate  responses  of  the  system  subeomponents  as  the  signal  propagates  from 
the  input  to  the  output. 

Although  some  versions  of  the  original  design  were  tested  in  hardware,  a  detailed 
resouree  analysis  was  not  eondueted.  The  most  eritieal  resouree  eonstraint  for  the  SDR 
eompression  algorithm  is  the  amount  of  memory  available  on  the  target  FPGA.  This 
chapter  diseusses  the  resouree  requirements  of  the  original  design.  It  also  diseusses  the 
capabilities  and  limitations  of  the  eireuit,  based  on  the  resourees  available  using  various 
eonfigurations  and  target  deviees.  The  ehapter  eoncludes  with  some  general  equations 
describing  the  eireuit’s  overall  performanee  and  resource  requirements.  These  equations 
are  useful  tools  in  configuring  the  eireuit  for  optimal  performance. 

A,  OVERALL  FUNCTIONALITY 

The  initial  SDR  design  ealeulates  the  FFT  of  a  sampled,  pre-demodulated. 
Intermediate  Frequeney  (IF)  signal.  A  eoneeptual  model  of  the  system  is  shown  in 
Figure  1 1 .  The  output  FFT  points  are  stored  in  temporary  memory  while  the  Bin  Energy 
Caleulation  subsystem  sorts  the  signal  into  time  and  frequeney  bins.  This  subsystem 
eomputes  the  amount  of  energy  in  eaeh  bin  [3]. 
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Figure  1 1 .  Conceptual  SDR  Model  [From  3]. 

The  Bin  Threshold  Analysis  subsystem  determines  if  the  energy  in  each  bin 
exceeds  a  user-defined  threshold.  The  Data  Management  subsystem  pulls  FFT  points 
from  memory  that  correspond  to  bins  exceeding  the  minimum  energy  threshold  and  adds 
header  information  to  enable  the  reconstruction  of  the  signal  after  downlink  [3]. 

B,  DESIGN  SETUP 

This  chapter  focuses  on  all  circuitry  after  the  calculation  of  the  FFT.  The  majority 
of  the  flow  path  is  linear,  although  a  limited  number  of  signals  loop  back,  providing  input 
to  algorithms  earlier  in  the  flow  path.  No  internal  signals  govern  the  function  of  the  FFT 
circuit.  The  overall  circuit  design  is  displayed  in  Figure  12.  This  shows  that  the 
compression  portion  of  the  circuit  reacts  to  stimuli  from  the  FFT  circuit.  The  FFT  circuit 
only  reacts  to  stimuli  external  to  the  circuit. 
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Figure  12.  Overall  Circuit  Design  [From  3]. 


The  specific  configuration  discussed  in  this  chapter  uses  the  FFTv4.1  IP  circuit. 
The  FFT  circuit  computes  the  1024-point  FFT  of  a  real  signal.  The  data  word  format  will 
be  referred  to  using  the  notation  BitWidth  DecimalPoint.  The  input  signal  had  a  width  of 
24  bits,  with  the  decimal  point  to  the  left  of  bit  23.  In  other  words,  the  data  word  format 
was  24  23.  As  discussed  in  Section  2.D.1,  the  output  bit  format  is  calculated  to  be 

BitWidth^^,p^,=BitWidth.^^^,+\og^\02A  =  3A_23.  (IV.l) 

In  the  initial  SDR  design,  the  output  of  the  FFT  is  then  reformatted,  scaling  the  signal  by 
a  factor  of2  and  prepending  a  zero  to  the  most  significant  bit.  The  signal  that  enters  the 
Bin  Energy  Calculation  block  has  a  data  word  format  of  35  33. 

The  start  signal  for  the  FFTv4.1  circuit  is  set  at  time  t  =  1.  The  first  input  point 
x[0]  is  sampled  at  t  =  1  and  delayed  four  clock  cycles,  entering  the  FFT  circuit  at  t  =  5  . 
After  the  initial  latency  discussed  in  Section  2.D.1  and  a  48  clock  delay,  the  first  output 
point  A(0)  reaches  the  Bin  Energy  Calculation  block  at  time  t  =  2209  . 
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c. 


BIN  ENERGY  CALCULATION 


1,  Time  Windowing  Subsystem 

The  function  of  the  Time  Windowing  subsystem  is  to  calculate  the  amount  of 
energy  in  each  FFT  output  point,  then  sum  the  amount  of  energy  in  each  point  over  a  time 
period  determined  by  the  user.  The  time  period  must  be  in  integer  multiples  of  the  FFT 
period.  The  signal  M  is  the  number  of  FFT  periods  considered  for  each  time  window. 

[3] 


a.  Signal  Flow 


As  discussed  in  [3],  the  energy  in  each  point  is  calculated  using  the 

expression 

Energy(A:)  =  Re(A:)^  +  Im(A:)^ .  (IV.2) 

This  is  implemented  using  two  multipliers  and  one  adder,  as  shown  in  Figure  13. 


Multi 


Figure  13.  Circuit  to  Calculate  Energy(A:)  [Erom  3]. 


System  Generator  provides  a  circuit  designer  with  the  option  to  set  the  output  precision 
and  latency  of  the  addition  and  timing  blocks.  By  double-clicking  on  the  block,  the 
designer  can  force  the  output  data  word  into  a  desired  format  using  a  graphical  interface 
shown  in  Eigure  14.  The  latency  can  also  be  adjusted  in  terms  of  clock  cycles. 
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Figure  14.  Multiplier  IP  Configuration  [From  16]. 

Increasing  the  latency  allows  the  compiler  to  implement  the  specified 
adder  or  multiplier  circuit  using  the  pipeline  with  the  number  of  stages  specified  in  the 
latency  block.  As  discussed  in  [23],  pipelining  breaks  up  the  circuit  so  that  the  longest 
delay  path  between  registers  is  reduced.  This  allows  the  overall  clock  speed  to  be 
increased,  which  increases  the  sampling  rate,  increasing  the  highest  analog  frequency  that 
the  FFT  can  measure.  No  pipelining  was  used  for  any  addition  or  multiplication  blocks 
in  the  initial  SDR  design. 

In  the  initial  SDR  design,  the  output  precision  for  this  energy  calculation 

was  set  to  “Full.”  In  this  configuration,  bits  are  added  to  the  output  data  word  to  preclude 

the  possibility  of  overflow.  The  number  of  bits  is  doubled  in  each  multiplier, 
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transforming  the  3  5  3  3  input  signal  into  an  output  signal  with  data  word  format  70  66. 
The  full  preeision  adder  appends  one  bit  to  the  signal,  resulting  in  a  data  word  format  of 
7166. 

The  flow  path  for  the  time  window  energy  ealeulation  is  shown  in  Figure 
15.  As  diseussed  in  [3],  the  energy  signal  is  stored  in  a  First-In,  First-Out  (FIFO)  buffer. 
As  sueeessive  sets  of  N  output  points  are  reeeived  from  the  energy  ealeulation  eircuit, 
each  new  value  is  added  to  the  corresponding  point’s  energy  accumulation  from  the 
previous  time  periods.  The  sum  is  written  back  into  the  FIFO  buffer  provided  the  number 
of  FFTs  processed  is  less  than  M,  the  number  of  FFT  periods  in  a  time  window. 


Figure  15.  Time  Window  Energy  Calculation  [From  3]. 

The  adder  used  for  this  circuit  is  configured  with  a  latency  of  zero  clock 
cycles,  indicating  that  pipelining  is  not  used.  The  output  precision  cannot  be  set  to  “Full” 
because  the  circuit’s  input  is  a  function  of  its  output.  When  System  Generator  attempts  to 
compile  the  circuit  in  this  configuration,  it  generates  an  error  because  it  assumes  size  of 
the  adder’s  output  data  word  would  grow  without  bound.  In  the  initial  SDR  design,  the 
adder’s  output  precision  was  fixed  at  54_42. 
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b.  Timing  Analysis 


As  discussed  earlier,  in  the  initial  SDR  design  the  lateney  of  the  adder  and 
multiplier  eireuits  is  zero.  The  amount  of  time  spent  proeessing  FFT  points  stored  in  the 
FIFO  is  dependent  on  the  value  of  M  The  eireuit  must  wait  until  N (M  -l)  points  have 
been  eolleeted,  with  the  summed  energy  veetor  ealeulated  and  stored  in  the  FIFO. 

When  the  Time  Windowing  subsystem  reeeives  the  first  point  of  the  final 
FFT  period  within  the  window,  the  output  of  the  adder  ean  be  forwarded  to  the  next  SDR 
subsystem.  The  eireuit  has  an  additional  two-eloek  delay  between  the  time  an  input  is 
reeeived  and  the  time  it  is  available  for  output,  whieh  aeeounts  for  the  inherent  delay 
assoeiated  with  the  FIFO  buffer.  For  a  ease  where  M  =  3 ,  the  first  point  of  the  third  FFT 
period  would  enter  the  Time  Windowing  subsystem  at  time,  t  =  2209  +  IN  =  4257 .  The 

first  output  point,  indieating  |Ao(0)|  +|Xj(0)|  +|X2(0)|  leaves  the  Time  Windowing 
subsystem  at  time,  t  =  4257  +  2  =  4259.  The  last  output  point,  indieating  the  sum 
|Xq(1023)|^ +|Xj(1023)|^ +1X2(1023)!^  is  forwarded  to  the  Frequeney  Windowing 
subsystem  at  time,  t  =  4259  +  1023  =  5282  . 

The  signal  flow  through  this  subsystem  is  managed  by  a  Finite  State 
Maehine  (FSM)  implemented  in  M-Code  by  the  pwrjime  algorithm.  The  eireuit’s  input 
and  output  signals  are  deseribed  in  [3].  In  addition,  the  eireuit  uses  internal  variables 
fft_count,  i_cnt,  and  delay Jl  to  eontrol  state  transitions  and  output  timing.  For  this 
subsystem,  the  signal  p_cnt  is  used  to  indieate  the  signal  xnjndex,  from  the  FFT  eireuit. 
The  signal  prep  is  used  to  indieate  the  edone  flag  from  the  FFT  eireuit. 
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Figure  16.  State  Transition  Diagram  for  pwr_time  Algorithm. 


A  state  transition  diagram  for  the  pwr_time  algorithm  is  displayed  in 
Figure  16.  The  state  transitions  and  output  for  the  pwrjime  FSM  are  displayed  in  Moore 
format,  with  the  output  solely  a  function  of  the  current  state  [23].  In  tmth,  the  FSM  is  in 
implemented  in  Mealy  format,  meaning  that  the  output  is  a  function  of  both  the  current 
state  and  the  inputs  [23].  When  implemented  in  hardware,  the  internal  variables  i_cnt, 
and  delay  Jl  would  function  as  inputs  to  the  FSM,  changing  the  way  each  state  produces 
output. 

The  affect  of  internal  signals  on  the  output  for  a  case  where  M  =  3  is 
illustrated  in  Figure  17.  The  multiplexer  which  controls  the  input  to  the  FIFO  buffer  is 
governed  by  the  memjnux  signal.  The  FIFO  input  is  either  from  the  FFT  circuit, 
indicated  by  “FFT,”  or  from  the  adder  circuit,  indicated  by  “Add.”  The  multiplexer 
which  controls  the  input  to  the  adder  is  governed  by  the  addjnux  signal.  The  adder  input 
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is  either  directly  from  the  FFT  circuit,  indicated  by  “FFT,”  from  the  FFT  circuit  with  a 
delay  of  one,  indicated  by  “Del,”  or  simply  zero.  The  first  two  clock  cycles  of  states  two 
and  three  have  different  output,  disabling  the  ability  to  write  to  the  FIFO  buffer  until  the 


adder  output  is  ready. 
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Figure  17.  Timing  of  Output  for  pwrjime  Algorithm,  N  =  1024,M  =  3  . 


The  FSM  is  designed  for  continuous,  streaming  input.  This  is  the  reason 
for  State  Four,  which  permits  processing  of  the  next  time  window  while  the  last  two 
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points  from  the  previous  window  are  output.  After  State  One,  all  transitions  are  based  on 
the  internal  variables  i_cnt  and  fft_count.  Regardless  of  the  input  reeeived  in  successive 
FFT  sets,  the  algorithm  continues  processing  until  it  reaches  the  end  of  State  Three.  If 
the  expected  prep  flag  is  not  detected  at  this  point,  the  algorithm  returns  to  State  Zero. 
Additionally  the  time_st  and  time_end  flags  are  only  set  on  the  first  and  last  clocks  of 
State  Three,  respectively. 


c.  Memory  Analysis 


The  biggest  resource  constraint  for  the  Time  Windowing  subsystem  is  the 
amount  of  memory  required  for  the  FIFO  buffer.  The  memory  required  is  a  function  of 
the  bit  width  of  the  data  word  and  the  depth  of  the  memory.  System  Generator  selects  the 
bit  width  of  memory  based  on  the  bit  width  of  the  input.  The  depth  of  memory,  which 
indicates  the  maximum  number  of  data  words  that  can  be  stored,  can  be  adjusted  by  the 
circuit  designer  using  a  pull-down  menu,  where 

Depth  =  2"  for  n  =  4,5,...,16  .  (IV.3) 

After  the  values  from  the  first  N  points  are  stored,  the  FIFO  buffer  reads  a  value  on  every 
clock  cycle.  Therefore,  the  FIFO  buffer  only  requires  enough  depth  to  contain  N  values. 
The  initial  SDR  design  sets  the  FIFO  depth  to  “4K.”  The  corresponding  memory 
requirement  is 


Memory  =  Bit  Width  x  Depth 

Memory  =  71  bits  x  4K  =  284  Kbit  =  35.5  KB 


(IV.4) 


2.  Frequency  Windowing  Subsystem 


As  discussed  in  [3],  the  Frequency  Windowing  subsystem  sorts  the  accumulated 
energy  vector  passed  from  the  Time  Windowing  subsystem  into  bins  representing 
frequencies  of  interest.  At  this  point,  elements  of  the  energy  vector  that  are  not  within  the 
range  of  a  frequency  bin  are  not  forwarded  for  further  analysis.  The  subsystem  uses  an 
accumulator  to  compute  the  amount  of  energy  in  each  frequency  bin. 
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a.  Timing  Analysis 

The  signals  in  this  subsystem  are  eontrolled  by  two  finite  state  maehines. 
As  eaeh  element  of  the  energy  veetor  is  output  from  the  Time  Windowing  subsystem,  it  is 
stored  sequentially  in  a  Dual  Port  RAM.  This  proeess  is  eontrolled  by  the  we_time_win 
algorithm.  When  the  last  point  is  written,  the  algorithm  sets  the  jft_e  flag.  This  signals 
the  re Jreq_win  algorithm  to  read  the  appropriate  Ranges  of  Interest  (ROIs)  from 
memory.  State  transition  diagrams  for  eaeh  algorithm  are  shown  in  Figure  18. 


wE  time  win  rEfreawIn 


Figure  18.  State  Transition  Diagrams  for  Frequeney  Analysis  Subsystem. 

The  rng_s  and  rng  e  signals  align  with  the  start  and  end  of  eaeh  ROI, 
respeetively.  They  are  generated  along  with  the  addresses  of  the  start  and  end  points  and 
then  delayed  by  one  eloek  to  align  with  the  start  and  end  points  as  they  are  available  from 
memory.  These  signals  are  used  to  eontrol  a  series  of  aeeumulators,  whieh  add  eaeh 
sequential  input  to  the  value  stored  on  the  last  eloek  eyele.  The  value  in  the  aeeumulator 
is  reset  at  the  end  of  eaeh  ROI.  The  bin  Jl  signal  is  set  for  one  eloek  eyele  at  the  end  of 
the  last  ROI.  This  signal  is  delayed  by  two  eloek  signals  to  align  with  the  data  signal  as  it 
leaves  the  aeeumulators  and  is  renamed  win  Jl. 
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From  this  point  forward,  the  timing  of  the  eireuit  is  dependent  on  the  ROI 
seleeted  and  the  input  signal.  To  demonstrate  the  eireuit’s  timing,  a  test  was  run  with  a 
real  input  signal,  where 

x[n]  =  ^(sin(2;rnx3/ 1024) +  sin(2;rnx5/ 1024))  for  n  =  0... 3095  .  (IV. 5) 

As  expeeted  from  the  DFT  ealeulation,  the  eorresponding  output  from  the  FFT  subsystem 
is 

Re(A:)  =  OVA: 

Im(A:)  = -0.128  for  A:  =  3,5  .  (IV. 6) 

=  0.128  for  A:  =  1019,1022 

The  energy  veetor  output  from  the  Time  Windowing  Subsystem,  with  M  =  3  is 

f  f  n28V^ 

3x  0'+  — ^  =0.047  ifA:  =  3,5,1019,  or  1022 

E{k)  =  \  {  U024j  j  .  (IV.7) 

0  otherwise 

The  ROIs  used  for  this  test  were  A:  =  0...7  and  A:  =  1019...  1023 .  The  expeeted  energy  in 
eaeh  frequeney  bin  is  2x0.047  =  0.094  .  For  eonvenienee,  F{x)  is  defined  as  the  total 
number  of  FFT  points  in  ROI(x)  .  In  this  example  there  are  two  ROIs,  where  F(0)  =  8 
andF(l)  =  5 . 


A  timing  diagram  for  the  test  is  shown  in  Figure  19.  The  first  ROI  point 
is  read  from  memory  A  + 1  eloek  eyeles  after  the  first  energy  point  is  written.  In  this 
example,  the  first  energy  point  is  written  at  t  =  4259  and  the  first  ROI  point  is  read  from 
memory  at  t  =  5284  .  The  sum  of  energy  in  eaeh  bin  is  available  after  every  element  of 
the  energy  veetor  is  entered  into  memory,  the  entire  ROI  has  been  read,  and  the  signal  has 
left  the  aeeumulator.  This  sum  is  aligned  with  the  valid  signal,  whieh  is  the  same  as  the 
rng_e  signal  delayed  by  two  eloek  eyeles.  The  time  eaeh  ROI  energy  sum  is  available 
ean  be  expressed  as 

Read  Delay 

Time  Windowing  Delay  Write  Delay  ' - - ' 

t{YjROl{y)]  =  FFT  delay +(M-l)A  +  2+  +^F(x).  (IV.8) 

jc=0 
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Figure  19.  Timing  of  Frequency  Windowing  Subsystem. 
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For  the  two  ROI  in  this  test,  ROI(O)  and  ROI(l),  the  time  that  their  energy  is  available  is 
expressed  as 


t{^RO/(0)}  =  2209  +  2x1024  +  2  +  1024  +  1  +  8  =  5292 
t{^RO/(l)}  =  2209  +  2x1024  +  2  +  1024  +  1  +  8  +  5  =5297’ 


(IV.9) 


This  eorresponds  with  the  test  results  displayed  in  Figure  19.  For  eonvenienee,  the  time 
of  the  last  bin  energy  ealeulation  in  a  set  will  be  annotated  as  tROKfmai)- 


b.  Resource  Analysis 


Similar  to  the  FIFO  buffer  in  the  Time  Windowing  subsystem,  the  size  of 
the  Dual  Port  RAM  used  in  this  subsystem  is  dependent  on  the  bit  width  of  the  data  word 
and  the  depth  of  the  memory.  As  discussed  in  the  Time  Windowing  section,  the  output 
from  the  adder  is  in  52  42  format.  This  format  propagates  to  the  input  of  the  Dual  Port 
RAM,  setting  the  bit  width  for  the  block  at  52.  As  discussed  in  [24],  the  maximum  bit 
width  allowable  is  dependent  on  the  depth  of  memory  and  the  device. 

The  depth  of  the  Dual  Port  RAM  block  is  entered  using  a  fill-in  block  in 
the  System  Generator  user  interface.  For  the  initial  SDR  design,  this  value  was  2'^ .  The 
basis  for  this  was  most  likely  determined  by  the  expression 

Depth  =  A  X  (M  + 1)  x  mem  col 

(IV.  10) 

Depth  =  2'“x2'x2' =2'"  ^  ^ 


The  corresponding  memory  requirement  is 


Memory  =  Bit  Width  x  Depth 
Memory  =  52  x  2'^  =  1664  Kbit  =  208  KB  ’ 


(IV.  11) 


This  configuration  of  the  initial  SDR  design  was  only  tested  at  a  high  level 
of  abstraction  in  the  MATLAB®/Simulink®  environment  to  verify  that  the  algorithm 
would  function  as  designed.  The  algorithm  cannot  run  on  a  device  in  its  current  format 
because  this  memory  requirement  far  exceeds  the  capacities  of  the  target  devices.  The 
expression  in  Equation  (IV.  1 1)  does  not  represent  the  minimum  memory  requirement,  so 
the  memory  can  be  used  much  more  efficiently  with  some  adjustments. 
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Since  the  Dual-Port  RAM  only  stores  one  energy  vector  of  length  N  for 
every  M  FFT  periods,  the  faetor  (M  +  l)  in  Equation  (IV.  10)  ean  be  replaeed  with  (1). 
The  re  Jreqjwin  has  no  provision  for  reading  any  veetor  other  than  the  one  just  written  to 
memory.  In  a  worst-case  seenario  with  eontinuous,  streaming  FFT  output  and  M  =  \,  the 
Dual-Port  RAM  would  require  enough  depth  to  write  the  next  vector  of  N  elements  while 
frequeney  bins  from  the  last  one  are  being  read.  Assuming  ^F(x)  <  MN ,  the  maximum 
required  depth  for  a  1024-point  FFT  is 

Depth  =  2A  =  2x2''’ =2".  (IV.  12) 

As  diseussed  in  [3],  this  ean  be  aeeomplished  by  ehanging  the  value  in  the  Dual-Port 
RAM  System  Generator  interfaee,  setting  mem  _  co/  =  2  and  adjusting  the  bit  width  of  the 
addr_hi  signal  to  one.  The  new  memory  requirement  is 

Memory  =  Bit  Width  x  Depth 

(IV.13) 

Memory  =  52x 2"  =  104  Kbit  =  13  KB 

The  assumption  that  ^F(x)  <  A  is  not  neeessary  for  this  portion  of  the 
eireuit.  Based  on  the  timing,  if  ^F(x)  <  NM  ,  the  eireuit  would  still  funetion  as  long  as 
the  memory  is  modified  so  that  Depth  >  NM  .  Setting  limits  on  ^F(x)  is  neeessary  to 
ensure  that  the  eireuit  works  with  the  minimum  required  memory.  Additionally,  the 
^  F{x)  <  N  restrietion  is  required  later  in  the  signal  flow  path.  This  will  be  revisited 
later  in  the  ehapter. 

D,  BIN  THRESHOLD  ANALYSIS  AND  DATA  MANAGEMENT 

The  Bin  Threshold  Analysis  subsystem  simply  eompares  the  total  energy  in  eaeh 
bin  to  the  user-defined  threshold.  If  the  energy  in  the  bin  exeeeds  the  threshold,  the  ROI 
index  is  written  to  a  FIFO  buffer  in  the  Temporary  Data  Management  subsystem.  At  the 
end  of  the  bin  set,  the  window  number  and  the  number  of  bins  that  passed  are  also  stored 
in  FIFO  buffers.  These  buffers  are  duplieated  in  the  Header  Generation  subsystem. 
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1,  Timing  Analysis 


As  each  of  the  bins  is  analyzed,  the  wind_anal  algorithm  sets  the  wE_rng  signal, 
which  permits  the  ROI  index  to  be  written  to  memory.  At  tRoiffinai),  the  algorithm  sets  the 
wE_qty  signal,  permitting  the  window  number  and  number  of  passed  bins  to  be  written  to 
memory.  There  is  no  clock  delay  between  the  accumulator  output  from  the  previous 
subsystem  and  memory. 

The  wE_qty  also  acts  as  a  flag,  for  the  hdr_data_mgt  algorithm.  If  the  wE_qty 
flag  is  asserted  the  algorithm  checks  to  ensure  that  a  previous  window  is  not  being  read 
from  memory.  If  the  tmpjbusy  flag  is  not  asserted,  the  algorithm  sets  the  hdr  Jl  flag. 
After  a  two-clock  delay,  this  cues  the  out_hdr  algorithm  to  begin  generating  a  header  for 
the  downlink  data  frame. 

A  state  transition  diagram  for  the  outjidr  algorithm  is  shown  in  Figure  20.  State 
Zero  outputs  the  window  number  as  the  first  part  of  the  header  before  transitioning  to  the 
next  state.  In  State  One,  the  algorithm  outputs  the  number  of  bins  that  passed  the  bin 
threshold  analysis,  then  transitions  to  State  Two.  In  State  Two,  the  algorithm  outputs  the 
status  of  the  pri  flag,  then  transitions  to  State  Three.  The  length  of  the  final  part  of  the 
output  header  is  dependent  on  the  number  of  bins  that  passed  the  bin  threshold  analysis. 
The  algorithm  remains  in  State  Three,  adding  a  bin  number  to  the  header  on  each  clock 
cycle,  until  all  bins  that  passed  the  bin  threshold  analysis  are  added  to  the  header.  When 
all  of  the  appropriate  bins  have  been  added  to  the  header,  the  algorithm  sets  the  tmp  Jl 
signal  and  returns  to  State  Zero.  The  tmp  Jl  signal  cues  the  rejmp  algorithm  to  read  FFT 
points  from  temporary  memory,  sending  them  to  the  Output  Format  subsystem. 
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Figure  20.  State  Transition  Diagram  for  outjidr  Algorithm 


For  convenienee,  the  time  the  first  element  of  the  header  is  sent  to  the  Output 
Format  subsystem  will  be  annotated  as  thdr-  The  time  that  the  first  FFT  point  is  read  from 
temporary  memory  will  be  annotated  as  ttmp-  The  delay  between  tRoiffinai)  and  thdr  is  two 
clock  cycles.  The  delay  between  tRoi(fmai)  and  ttmp  is  expressed  as 

Header  Elements 

W  -  ^ROKfi^ai)  =  2  +  3  +  #  Passed  Bins  +  2  .  (IV.  14) 

The  additional  two-clock  delay  at  the  end  of  the  expression  indicates  the  time  required  to 
read  the  ROI  from  temporary  memory  then  read  the  first  FFT  point  from  temporary 
memory.  The  algorithm  reads  MF{x)  points  from  temporary  memory.  After  each  ROI, 
there  is  a  one-clock  delay  so  the  algorithm  can  read  the  next  ROI  from  temporary 


memory. 
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2. 


Resource  Analysis 


The  only  memory  requirements  for  these  subsystems  are  the  two  sets  of  three 
FIFO  buffers  used  to  store  ROI  information.  The  initial  SDR  design  sets  the  depth  of 
these  buffers  to  16.  This  sets  the  maximum  number  of  ROIs  that  ean  be  passed  as  16. 
The  bit  width  for  eaeh  value  stored  is  five.  The  total  memory  requirement  for  this 
subsystem  is 

Memory  =  2  x  3  x  Depth  x  Bit  Width 

(IV.  15) 

=  6x16x5  =  480  bits  =  60  Bytes 

This  amount  of  memory  is  very  small  eompared  with  the  total  memory  available  and  the 
other  memory  requirements  of  the  eireuit.  It  is  not  eonsidered  in  approximations. 

The  re_tmp  algorithm  finds  the  values  in  temporary  memory  by  using  the  eurrent 
relative  time  window  as  an  index,  where  eaeh  indexed  time  window  has  MN  elements. 
The  re_tmp  algorithm  determines  the  eurrent  time  window  index  from  the  winjium 
signal,  whieh  is  ineremented  after  every  bin  set  by  the  wind_anal  algorithm.  The  signal 
is  reset  when  win  _  num  ==  mem  _  col  - 1 .  Although  this  has  no  impaet  on  the  memory 
requirements  of  this  subsystem,  this  sets  a  key  eonstraint  on  the  amount  of  temporary 
memory  required.  The  memory  depth  must  be  an  integer  multiple  of  MN. 

E.  TEMPORARY  STORAGE  AND  OUTPUT  CONTROL 

1,  Temporary  Storage  Subsystem 

The  two  Dual-Port  RAM  bloeks  in  this  subsystem  eonstitute  the  largest  memory 
requirement  for  the  overall  eireuit.  The  memory  must  be  able  to  store  NM  points  while 
the  eireuit  determines  whieh  points  to  downlink.  Sinee  the  temporary  memory  must 
eontinue  storing  additional  FFT  points  while  the  energy  is  being  ealeulated,  additional 
spaee  is  required.  This  amount  is  dependent  on  the  amount  of  time  required  to  eompute 
the  energy  and  downlink  points  of  interest.  After  all  points  of  interest  from  a  given  time 
window  have  been  read,  the  memory  that  stored  the  time  window  ean  be  overwritten. 
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As  discussed  in  previous  seetions,  the  timing  of  the  eircuit’s  output  is  dependent 
on  ^F(x)  and  the  number  of  bins  that  pass  the  bin  threshold  analysis.  By  setting  a 

restrietion  that  ^F(x)  <  A,  a  worst-ease  seenario  ean  be  examined  where  ^F(x)  »  N 


and  every  bin  passes  the  bin  threshold  analysis.  Any  additional  overhead  ean  be 
negleeted,  provided  NU  16.  In  this  approximation,  the  minimum  memory  depth 
required  is 


Divide  into  bins 

Time  Window  ^  ^ 

, - - V  #ROI-l 

Temp  Memory  Depth  =  (M  - 1)  A  +  A  +  ^  F(x)  + 

x=i) 


#ROI-l 

M  X  F{x) 

x=0 


Read  passed  bins  from  memory 

=  MA  +  A  +  MA=A(2M+1) 


(IV.  16) 


It  is  important  to  note  here  that  the  ^F(x)  <  A  restrietion  is  neeessary  for  the 
eircuit  to  funetion.  If  ^F(x)  >  A  and  every  bin  passes  the  bin  threshold  analysis,  then  it 

would  take  longer  to  read  values  out  of  temporary  memory  than  it  would  to  write  them. 
If  this  behavior  persisted  over  several  time  windows,  the  cireuit  would  eventually  run  out 
of  memory  regardless  of  the  depth.  If  this  funetionality  is  desired,  the  eireuit  would  need 
to  be  modified  to  ensure  that  FFT  points  in  overlap  regions  between  bins  are  not  read 
more  than  onee. 


As  stated  in  the  previous  section,  the  temporary  memory  must  be  addressable  in 
integer  multiples  of  MN.  This  ean  be  done  either  by  either  rounding  up  to  3MA  or 
rounding  down  to  2MA .  In  order  to  round  down,  the  ROI  size  must  be  further  restrieted 
such  that 


#ROI-\  #ROI-\ 

MN+  Y,  F{x)  +  M  Y  F(x)<2MN 

a:=0  x=0 


< 


MN 
(M  +  I) 


(IV.  1 7) 


In  an  example  where  M  =  3  and  A  =  1024 ,  the  ROI  size  would  be  restrieted  to 


3(1024) 

£  r=768 

(3  +  1) 


(IV.  18) 
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The  restriction  in  Equation  (IV.  17)  indicates  the  minimum  memory  requirement  for  this 
subsystem.  As  can  be  inferred  from  the  inequality,  in  order  to  achieve  a  memory 
requirement  of  MN,  the  total  ROI  size  must  equal  zero.  As  M  increases,  a  larger  fraction 
of  N  points  can  be  included  in  the  ROI  because  the  fraction  of  time  associated  with 
overhead  becomes  smaller. 


Another  consideration  when  determining  Temporary  Memory  size  is  the  fact  that 
the  memory  depth  of  the  dual-port  RAM  must  correspond  to  the  bit  width  of  the  signal 
used  to  address  memory.  If  w  is  the  bit  width  of  the  memory  address,  then  the  depth  of 
memory  must  be  2"" .  If  M  is  not  a  power  of  two,  portions  of  Temporary  Memory  will  be 
assigned  but  left  unused. 

The  initial  SDR  design  set  a  Temporary  Memory  depth  of  32N  for  each  Dual-Port 
RAM,  which  restricts  the  number  of  EFT  periods  in  each  time  window  toM  <  15 .  As 
discussed  in  previous  sections,  the  bit  width  of  the  EFT  data  points  is  35  in  this  design. 
The  resulting  memory  used  is 


Memory  =  2  x  Depth  x  Bit  Width 

=  2x2'x2'°x35  =  2240Kbit  =  280KB’ 


(IV.  19) 


2,  Output  Format  Subsystem 


The  performance  of  this  subsystem  is  discussed  in  [3].  There  is  no  delay  between 
the  time  an  output  signal  enters  the  subsystem  and  the  time  that  it  is  written  to  the 
downlink  FIFO  buffer.  This  subsystem  manages  the  input  to  the  downlink  FIFO  buffer 
using  signals  produced  by  previous  subsystems. 

The  only  memory  requirement  for  the  Output  Control  Subsystem  is  the  FIFO 
buffer  that  holds  the  output  data  frame  while  waiting  for  an  external  downlink  signal.  In 
the  initial  design  testing,  downlink  was  only  disabled  for  short  periods  of  time  to  verify 
that  the  circuit  would  restrict  the  amount  of  data  chosen  for  downlink  when  the  buffer 
reached  25%  of  its  capacity.  For  this  reason,  the  FIFO  buffer  depth  is  only  32.  For 
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maximum  downlink  flexibility,  the  end  user  should  use  all  remaining  memory  resources 
to  increase  the  size  of  this  buffer.  The  amount  of  memory  used  in  the  initial  SDR  design 
is 


Memory  =  Depth  x  Bit  Width 

=  32  X  70  =  2240  bits  =  280  Bytes 


(IV.20) 


F.  GENERALIZED  CIRCUIT  EXPECTATIONS 


The  previous  sections  discussed  the  timing  and  resource  usage  of  each  subsystem 
associated  with  the  portions  of  the  initial  SDR  design  used  for  signal  compression.  The 
following  generalizations  summarize  the  conclusions  reached  in  the  previous  sections. 
These  equations  were  created  based  on  both  analysis  of  the  circuit’s  design  and 
observations  from  simulations  where  N  =  1024  and  M  =  3  .  The  expressions  should  hold 
valid  for  any  configuration  where  NU  16  and  M  is  a  power  of  two. 

The  delay  from  the  first  signal  input  to  the  first  output  indicates  the  overall 
latency  of  the  design.  The  delay  is  not  calculated  after  the  output  values  are  written  to  the 
downlink  FIFO  buffer  because  the  rate  at  which  values  are  read  out  is  dependent  on 
conditions  external  to  the  circuit.  The  delay  to  the  time  that  the  first  header  element  is 
written  to  the  downlink  FIFO  buffer  (tkdr)  is  expected  to  be 

Frequency  Window 

,  TimeWindow_'  #RoiT~  ' 

^hcir  =FFT  Latency  +  A(M-I)  +  2  + A  +  I+  £  F(x)  +  I  +  2.  (IV.2I) 

x=0 

The  delay  from  the  first  signal  input  to  the  time  that  the  last  output  is  written  to  the 
downlink  FIFO  buffer  indicates  the  amount  of  time  required  to  generate  the  data  frame 
for  downlink  {tframe}-  If  the  number  of  passed  bins  is  expressed  as  P,  then  tframe  can  be 
expressed  as 

Transmit  Passed  Bins 

Frame  Header  ^  p  '  Delay  Between  Successive  Bins 

=v,+  ^  +2+  .  (IV.22) 

x=0 

By  approximating  the  total  ROI  size  as  ^F(x)  «MA/(M  +  l) ,  the  total  amount  of 
memory  required  for  the  system  can  be  expressed  as 
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Dual-Port  RAM 


Time  Window  FIFO 


Temp  Memory 


Memory  =  FFT  Memory  +  (76  bits)  A  +(54  bits)2A  +  (35  bits)4My  .  (IV. 23) 
+  Pre-Downlink  Storage 


G.  SUMMARY 

This  ehapter  provided  a  detailed  analysis  of  the  initial  SDR  design.  State 
transition  diagrams  were  ereated  to  further  illustrate  the  funetion  of  the  design’s  eontrol 
algorithms.  General  eireuit  equations  were  developed  to  ereate  expeetations  for  the 
eireuit’s  timing  and  memory  resouree  requirements.  These  expressions  ean  be  used  as 
design  equations  to  determine  appropriate  values  for  M,  N,  and  ROI  size  based  on  the 
desired  performanee  and  the  resourees  available  in  the  target  deviee.  The  next  ehapter 
explains  how  this  information  ean  be  used  to  make  ehanges  that  improve  both  the 
effieieney  and  the  funetionality  of  the  design. 
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IV.  INITIAL  MODIFICATIONS  TO  THE  ORIGINAL  DESIGN 


The  conclusions  reached  in  the  previous  chapter  facilitate  making  changes  to  the 
circuit  without  impact  to  its  overall  functionality.  This  chapter  examines  how  the  circuit 
can  be  adjusted  to  improve  its  portability  between  different  FPGA  devices.  It  also 
introduces  bandwidth  and  resource-conserving  measures  through  the  use  of  the  conjugate 
symmetry  property  inherent  in  Fourier  Transforms  of  real  signals.  This  chapter 
demonstrates  the  use  of  the  FFTvl.O  IP  with  the  SDR  circuit,  permitting  the  circuit’s  use 
on  a  Virtex™-!  FPGA. 

A,  INCREASE  COMPRESSION  AND  MEMORY  EFFICIENCY 

This  section  discusses  how  the  conjugate  symmetry  property  for  Fourier 
Transforms  of  real  signals  can  be  used  to  improve  the  storage  and  downlink  efficiencies 
of  this  circuit. 

1.  Theory 

As  discussed  in  [9],  if  the  input  signal  to  a  Fourier  Transform  is  real  then  the 
output  is  conjugate  symmetric  as  shown  in  the  expression 

X{-F)  =  X\F),  where  x(0  =  x*(0.  (V.l) 

Similarly,  the  Discrete  Fourier  Transform  has  a  symmetric  property.  For  a  real  input 
sequence  of  x\n\,n  =  0,...,  A-1 ,  the  corresponding  output  is 

X[k]  =  X*[N -k]  fork  =  0,...,A-l.  (V.2) 

From  this  expression,  it  is  evident  that  if  the  first  N  /  2  points  of  the  FFT  output  are 
known,  the  remaining  output  points  can  be  reproduced.  Therefore,  if  the  first  N  !  2  FFT 
output  points  are  used  for  analysis,  storage,  and  downlink  selection  then  the  remaining 
N  !  2  FFT  output  points  are  redundant.  This  information  can  be  used  to  improve  the  way 
the  SDR  circuit  stores  and  compresses  information. 
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2,  Changes  Made 


The  FFT  IP  is  implemented  in  a  blaek  box,  so  this  portion  of  the  eircuit  eannot  be 
ehanged.  All  portions  of  the  eircuit  downstream  from  the  FFT  IP  need  to  be  adjusted  to 
use  only  half  of  the  output  points.  The  most  dramatic  change  is  to  the  Time  Windowing 
subsystem,  which  controls  the  timing  of  all  further  processing.  The  Frequency 
Windowing  subsystem  and  Temporary  Memory  subsystem  were  adjusted  to 
accommodate  reductions  in  the  amount  of  memory  required.  All  other  portions  of  the 
circuit  did  not  require  any  changes  because  their  functions  are  not  directly  dependent  on 
the  number  of  points  being  processed. 

a.  Changes  to  the  Time  Windowing  Subsystem 

As  discussed  in  Chapter  III,  the  Time  Windowing  subsystem  is  controlled 
by  the  pwrjime  algorithm.  In  the  initial  SDR  design,  the  pwr_time  algorithm  worked  on 
the  assumption  that  on  each  clock  cycle  a  new  point  was  entering  the  subsystem  to  be 
processed.  As  shown  in  Figure  16,  if  the  last  set  of  FFT  points  within  a  time  window  was 
received  and  the  prep  flag  was  not  set  then  the  algorithm  would  return  to  a  waiting  state, 
resetting  all  internal  variables. 

This  algorithm  was  streamlined  to  only  calculate  the  sum  of  the  energy  in 
the  first  N  !  2  points.  An  additional  waiting  state  was  added,  identified  as  State  Two  and 
displayed  in  the  modified  state  transition  diagram  shown  in  Figure  21.  State  Two  is 
entered  after  the  first  N !  2  points  are  written  to  the  FIFO  buffer.  This  state  retains  all 
internal  variables,  keeping  track  of  the  number  of  FFT  output  sets  that  have  been 
processed.  On  cue  from  the  prep  flag,  the  algorithm  transitions  from  State  Two  to  State 
Three  if  the  current  FFT  set  is  not  the  last  one  in  the  time  window.  If  the  current  FFT  set 
is  the  last  one  in  the  time  window,  the  algorithm  transitions  from  State  Two  to  State  Five. 
Similar  to  State  Three  in  the  original  design.  State  Five  activates  output  flags  and  disables 
writing  to  the  FIFO. 

Two  additional  states  were  added  to  reflect  the  fact  that  the  circuit  behaves 
differently  for  two  clock  cycles  following  the  last  FFT  input  to  be  processed  within  a  set. 
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State  Four  and  State  Six  immediately  follow  State  Three  and  State  Five,  respectively. 
State  Four  ensures  that  writing  to  the  FIFO  is  enabled  for  two  clock  cycles.  State  Six 
ensures  that  the  output  is  enabled.  Adding  these  states  brings  the  algorithm  closer  to  the 
Moore  machine  FSM  model,  although  the  output  of  State  Three  is  still  dependent  on  an 
internal  delay  flag.  State  Seven  was  added  to  permit  streamlined  processing  in  the  case 
where  M  =  1 .  Instead  of  storing  FFT  points  in  the  FIFO  buffer,  they  are  routed  directly 
to  the  subsystem  output,  and  output  flags  are  enabled. 


Figure  21 .  State  Transition  Diagram  for  the  Modified  pwrjime  Algorithm. 


The  algorithm  must  still  wait  for  the  FFT  to  process  N  (M  - 1)  points 

before  it  can  begin  outputting  the  summed  energy  vector.  As  a  result,  there  is  no 
difference  in  the  amount  of  time  required  to  calculate  the  energy  in  the  time  window.  An 
updated  timing  chart,  reflecting  the  new  states  is  shown  in  Figure  22. 
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Figure  22.  Timing  of  Modified  pwrjime  Algorithm. 
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b.  Changes  to  Frequency  Windowing  Subsystem 


The  Frequency  Windowing  subsystem  required  only  a  few  minor  changes 
to  ensure  that  it  makes  the  most  efficient  use  of  memory  based  on  the  new  output  from 
the  Time  Windowing  subsystem.  By  modifying  the  restriction  on  ROl  size  so  that 
^F(x)  <N !  2-  \ ,  the  algorithm  will  have  time  to  read  all  stored  energy  points  before 

the  next  energy  vector  needs  to  be  written  to  memory.  In  a  worst-case  scenario,  where 
M  =  1 ,  all  points  must  be  written  to  memory  and  read  out  by  frequency  bin  within  N 
clock  cycles.  This  ensures  that  the  next  time  window  energy  vector  will  not  overwrite 
memory  that  has  not  yet  been  read.  The  we_time_win  algorithm  was  adjusted  to  write 
only  N !  2  energy  points  to  memory.  The  first  point  is  read  from  memory  after  a  write 
delay  of  N !  2  +  \ .  The  time  that  the  last  read  occurs  should  be  at  least  one  clock  before 
the  next  energy  vector  is  written  to  memory.  By  restricting  the  ROl  size  so  that 
^ F{x)  =  N ! 2-2 ,  this  constraint  will  be  met.  If  the  user  orders  the  frequency  bins  in 

such  a  way  that  the  first  points  of  the  energy  vector  are  read  first,  this  restriction  could  be 
eased  since  there  is  less  risk  of  memory  overwrite. 

Adding  these  restrictions  on  circuit  timing  and  ROl  size  ensured  that  the 
Frequency  Windowing  subsystem  could  be  implemented  using  a  RAM  depth  of  only 
N !  2 .  This  eliminated  the  need  for  the  addrjii  signal,  which  was  used  to  permit  the 
storage  of  multiple  time  window  energy  vectors.  Since  only  one  time  window  energy 
vector  is  stored  at  a  time,  the  mem_col  signal  used  to  control  how  many  energy  vectors 
are  written  to  memory  was  made  obsolete  and  removed  as  an  input  to  the  subsystem.  The 
addrjo  signal  was  adjusted  to  use  only  nine  bits  for  FFT  index  addressing. 

c.  Changes  to  Temporary  Memory 

As  discussed  in  Chapter  111,  the  required  temporary  memory  depth  is 
dependent  on  the  amount  of  time  required  to  compute  the  energy  and  downlink  points  of 
interest.  In  the  initial  configuration,  the  temporary  memory  must  be  addressable  in 
integer  multiples  of  MN.  In  an  optimal  memory  configuration  of  2MN,  the  total  ROl  size 
must  be  restricted  as  shown  in  Equation  (IV.  17). 
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As  discussed  in  the  previous  seetion,  in  the  modified  eireuit  the  ROI  is 
restricted  so  that  ^F(x)  <N  !  2-  \ .  This  easily  meets  the  eriteria  of  Equation  (IV.  17)  to 
use  the  optimal  memory  eonfiguration  of  2MN.  Sinee  the  modified  eireuit  only  needs  to 
store  the  first  N  !  2  points,  the  memory  requirement  is  redueed  to  2M  ( V  /  2)  =  MN . 


While  the  original  eireuit  needed  to  eontend  with  eontinuous  streaming 
input,  the  modified  eireuit  has  a  delay  of  N 12  eloek  eyeles  before  suecessive  time 
windows  are  written  to  memory.  If  the  ROI  size  were  restrieted  so  that  all  of  the 
information  eould  be  transmitted  within  N  !  2  eloek  eyeles,  then  the  memory  depth  could 
be  limited  to  MN  12.  In  order  for  this  to  oeeur,  the  ROI  must  be  restricted  to  satisfy  the 
inequality 


^F(x)  +  M^F(x)<  V/2 

yF(x)<  ,  ^  ' 

^  2(M  +  1) 


(V.3) 


Unlike  the  inequality  of  Equation  (IV.  17),  inereasing  M  further  restriets  the  permissible 
size  of  the  ROI.  This  option  is  explored  as  a  possibility  for  further  redueing  memory 
requirements,  but  is  not  recommended  for  circuits  using  continuous  EET  output  since  it 
eould  severely  restrict  the  cireuif  s  utility  to  the  end  user. 


The  we_tmp  algorithm  was  adjusted  to  return  to  a  waiting  state  after 
N /2  points  are  written  to  temporary  memory,  preserving  an  internal  variable  to  keep 
traek  of  how  many  EET  periods  have  been  stored.  The  algorithm  still  relies  on  the 
mem_col  signal  to  determine  how  many  EET  periods  should  be  stored  in  memory.  The 
algorithm  resets  its  high-level  memory  address  output  to  zero  when  the  number  of  stored 
EET  periods  is  equal  to  mem _colxM .  The  re_tmp  algorithm  was  adjusted  to  ensure 
that  the  bit  widths  of  the  memory  addresses  matehed  the  new  size  of  the  RAM  module  in 
the  Temporary  Memory  subsystem.  No  other  ehanges  were  neeessary  beeause  the 
re_tmp  algorithm  reeeives  its  cues  from  other  algorithms  already  adjusted  for  a 
N  !  2  eonfiguration. 
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d.  Changes  to  Test  Configuration 

The  examples  shown  in  Chapter  III  used  a  ROI  with  A:  =  101 9...  1023 

Sinee  these  points  are  now  exeluded  from  storage  and  analysis  new  ROIs  were  ereated  to 
test  the  performanee  of  the  eireuit.  Modifieations  to  the  design  were  tested  using  a 
eonfiguration  with  N  =  \02A ^  M  =  3^  and  user-defined  ROIs  ^  =  0"-5  and  k  =  6...9 
The  input  signal  was  also  adjusted  to  ensure  energy  was  available  in  eaeh  frequency  bin. 
Additional  frequencies  were  added  to  the  signal  described  in  Equation  (III. 5)  to  ensure 

the  EFT  would  produce  detectable  output  for  ^  =  {3, 5, 7, 9} 


3,  Update  to  Circuit  Generalizations 


The  changes  made  to  the  circuit  necessitate  updates  to  the  timing  and  memory 
expressions.  Equation  (IV.21)  expresses  the  delay  to  the  time  that  the  first  header 
element  is  written  to  the  downlink  FIFO  buffer  (thdr)-  In  the  modified  circuit,  this  value  is 
expressed  as 

Frequency  Window 

Time  Window  '  ' 

_ a2 _ ^  Af  #ROI-l 

Im.  =FFT  Latency +A(M-l)  +  2  +  —  +  l+  ^  F(x)  +  l  +  2  (V.4) 

2  x=0 

There  is  no  change  to  Equation  (IV. 22),  which  includes  thdr  in  the  expression.  Equation 
(IV.23)  expresses  the  amount  of  memory  required  for  the  system.  In  the  modified  circuit, 
this  value  is  expressed  as 

Time  Window  FIFO  Dual-Port  RAM 

^ - V  ^ - V  Temp  Memory 

Memory  =  EFT  Memory  +  (76  bits) -y +  (52  bits) -y  + (35  bits) 2MV  (V.5) 
+  Pre-Downlink  Storage 

B,  INTEGRATING  NEW  EFT  IP 


As  discussed  in  Chapter  III,  the  initial  SDR  design  used  the  FFTv4.1  IP  to 

compute  the  EFT.  Chapter  II  highlighted  some  of  the  differences  between  the  FFTv4.1 

IP  and  the  FFTvl.O  IP.  If  a  Virtex™-!  FPGA  is  the  desired  target  device,  the  circuit 

cannot  use  the  FFTv4.1  IP.  This  section  addresses  how  the  circuit  must  be  modified  to 

accommodate  the  FFTvl.O  IP,  which  is  compatible  with  a  Virtex™-!  FPGA. 
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1,  Replacing  FFTv4,l 


The  FFT  configuration  used  in  the  initial  SDR  design  is  shown  in  Figure  3.  The 
updated  configuration  is  shown  in  Figure  23.  As  discussed  in  Chapter  II,  the  original 
design  manually  scaled  the  output  by  1/  A .  The  FFTvl.O  automatically  scales  the  output 
by  this  factor,  so  these  scaling  blocks  were  removed.  The  start  signal  is  connected  to  the 
vin  FFTvl.O  input,  indicating  when  the  FFT  should  begin  computing.  The  real  and 
imaginary  input  signals  are  connected  to  the  appropriate  input  ports.  The  imaginary 
portion  of  the  input  signal  is  set  to  zero  outside  the  subsystem. 


Figure  23.  FFTvl.O  as  Used  in  the  SDR. 

Since  the  FFTvl.O  IP  does  not  represent  the  e_done  signal  in  its  System  Generator 
interface,  the  signal  is  simulated  from  the  done  signal  by  delaying  all  other  output  signals 
one  clock  period.  The  Xk_index  signal  is  also  unavailable  through  the  System  Generator 
interface.  This  signal  is  simulated  using  a  counter.  The  counter  is  reset  by  the  done 
signal  and  counts  from  zero  to  1024,  incrementing  each  clock  period.  The  valid  signal  is 
used  to  control  a  multiplexer.  This  ensures  that  the  counter  output  is  forwarded  to  the 
next  subsystem  if  the  FFT  output  is  valid.  A  constant  zero  is  forwarded  if  the  data  is  not 
valid. 
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2.  Implementation  Issues 


As  discussed  in  Chapter  II,  the  FFTvl.O  samples  ineoming  data  points  at  one 
quarter  of  the  elock  rate  used  for  proeessing.  In  the  System  Generator  environment,  this 
is  eontrolled  by  setting  the  Simulink®  system  period  to  0.25  in  the  System  Generator 
module  interfaee  as  diseussed  in  [3]  and  shown  in  Figure  24.  Adjusting  the  sampling  rate 
for  the  FFTvl.O  IP  module  ehanges  the  sampling  rates  of  other  modules  within  the 
design.  This  eauses  problems  with  the  finite  state  machines  in  the  remainder  of  the 
eireuit,  whieh  must  sample  input  at  the  same  frequency  as  the  system  cloek  rate. 


Figure  24.  System  Generator  Configuration  for  FFTvl .0  [After  16]. 

The  workaround  is  to  divide  the  eireuit  into  two  separate  Simulink®  designs. 
From  a  development  standpoint,  this  means  that  the  eireuit  must  be  compiled  in  separate 
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parts.  The  two  Simulink®  designs  can  be  used  to  generate  HDL  netlists  that  can  be  re¬ 
integrated  in  the  ISE  environment.  If  more  than  one  FPGA  is  available  in  the  final  circuit 
configuration,  it  may  be  desirable  to  run  the  FFT  computation  and  the  remainder  of  the 
SDR  circuit  on  separate  FPGAs. 

The  first  Simulink®  model  for  the  adjusted  circuit  is  shown  in  Figure  25.  The 
second  Simulink®  model  is  shown  in  Figure  26.  When  testing  in  the  Simulink® 
environment,  the  FFT  model  is  run  first.  The  output  of  the  first  simulation  is  stored  in  the 
MATFAB®  workspace  variable  FFTout  as  a  two-dimensional  matrix  with  four  columns. 
This  information  is  separated  and  reformatted  using  M-Code  so  that  it  can  be  used  as 
input  for  the  compression  algorithm.  Appendix  A  discusses  the  execution  of  tests  in  the 
Simulink®  environment  in  further  detail. 


Figure  25.  FFTvl  .0  separated  from  Compression  Algorithm. 

3,  Changes  to  Performance  Expectations 

As  discussed  in  Chapter  II,  there  is  a  3A  clock  delay  between  the  last  output  of 
one  FFT  period  and  the  first  output  of  the  next  FFT  period.  The  new  pwrjime  algorithm 
handles  this  gracefully  because  it  already  anticipates  a  delay  between  valid  inputs.  The 
same  is  true  for  the  we_tmp  algorithm.  The  expression  for  thdr  is  updated  for  the  FFTvl. 0 
IP  to  show  that 

Frequency  Window 

,  Time_Window_'  '  #ro^1 

=FFTFatency +4A(M-l)  +  2  +  — +  1+  ^  F(x)  +  1  +  2.  (V.6) 

2  x=o 
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Figure  26.  Compression  Algorithm  Separated  from  FFT  [After  3]. 
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The  3N  clock  delay  means  that  the  restriction  on  ROI  size  ^F(x)  shown  in 

Equation  (V.3)  can  be  relaxed.  Equation  (V.3)  indicates  that  all  information  needs  to  be 
transmitted  within  N !  2  clock  cycles  after  the  last  EET  output  point  is  written  to 
temporary  memory.  The  additional  delay  means  that  3N  can  be  added  to  the  right  side  of 
the  inequality  so  that 

Y,F{x)  +  mY,F{x)<3.5N 

3.5N  .  (V.7) 

^  (M  + 

The  adjustment  to  the  inequality  means  that  holding  only  one  time  window  in  memory  is 
now  a  feasible  option,  provided  M  <6  .  As  discussed  in  Chapter  II,  the  output  bit  width 
of  the  EETvl.O  IP  is  less  than  the  output  bit  width  of  the  EETv4.0  IP.  This  introduces  the 
potential  for  additional  memory  savings  in  a  circuit  that  uses  the  EETvl.O  IP.  The 
reduced  memory  requirement  can  be  expressed  as 

Time  Window  FIFO  Dual-Port  RAM  Temp  Memory 

Memory  =  EFT  Memory  +  (32  bits)  ^ +  (32  bits)  ^  +  (16  bits)  2M^ .  (V.8) 
+  Pre-Downlink  Storage 

C.  ADJUSTING  HEADER  FORMAT  AND  DOWNLINK  CONTROL 

This  section  discusses  changes  that  improve  the  efficiency  of  the  initial  SDR 
design’s  output  mechanisms.  Changes  were  made  to  the  header  format,  and  a  throttling 
mechanism  was  added  to  ensure  that  only  valid  data  is  sent  to  the  output. 

1.  Changes  to  the  Header  Format 

As  discussed  in  Chapter  III,  the  initial  SDR  design  uses  a  variable-length  header 
to  indicate  the  time  window  being  transmitted,  the  number  of  ROI  that  passed  the  bin 
analysis,  whether  the  circuit  was  operating  in  a  constrained  memory  condition,  and  which 
ROI  passed  the  bin  analysis.  Each  element  of  the  header  is  transmitted  on  successive 
clock  periods.  This  format  is  shown  in  Figure  27. 
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Figure  27.  Initial  SDR  Design  Header  Format. 


As  discussed  in  Chapter  III,  the  initial  SDR  design  transmits  a  70-bit  data  word  on 
each  clock  cycle.  The  format  of  the  header  in  the  initial  design  is  extremely  inefficient  in 
its  use  of  downlink  bandwidth.  The  information  in  the  header  can  be  compressed  into  a 
more  efficient  fixed-length  format.  One  way  to  ensure  the  header  has  a  fixed  length  is  to 
restrict  the  number  of  ROI  that  can  be  used.  Setting  the  possible  number  of  ROI  to  a 
predetermined  value  means  that  only  one  bit  is  required  for  each  ROI  to  indicate  whether 
it  passed  bin  analysis  or  not.  This  method  also  removes  the  need  to  transmit  the  number 
of  ranges  that  passed  bin  analysis  as  a  separate  quantity  since  this  could  be  computed  by 
summing  the  ROI  that  were  flagged.  A  modified  header  format  is  shown  in  Figure  28. 


Header  Elemeiiti 


Figure  28.  Modified  SDR  Header  Format. 


The  size  of  the  header  was  reduced  to  accommodate  an  output  bit  width  of  32, 
which  is  the  minimum  amount  required  to  transmit  the  output  of  the  FFTvl.O  IP.  If  the 
output  bit  width  is  larger,  zeroes  can  be  prepended  to  the  header  to  match  the  output  data 
format.  The  format  of  the  header  was  changed  to  pass  all  required  information  within 
two  header  elements.  The  first  element  provides  room  for  expansion,  should  additional 
information  be  required  at  a  later  time.  In  its  current  format  the  first  element  contains  a 
28-bit  preamble  for  synchronization,  followed  by  16  bits  used  for  version  control.  The 
second  element  uses  16  bits  to  indicate  the  window  number.  Although  this  is  a  small 
number  for  all  examples  used  with  this  thesis,  this  slot  could  be  used  in  the  future  to 
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communicate  the  time  that  the  first  FFT  input  was  sampled  for  the  time  window.  The 
next  15  bits  are  flags  to  indieate  whether  the  user-defined  ROI  passed  bin  analysis.  The 
last  bit  indicates  if  the  circuit  is  operating  under  a  memory  eonstrained  condition. 


out  hdr 


Figure  29.  Modified  State  Transition  Diagram  for  out_hdr  Algorithm. 


These  changes  were  implemented  by  modifying  the  out_hdr  algorithm.  The  new 
state  transition  diagram  is  shown  in  Figure  29.  In  State  Zero,  the  algorithm  waits  for  a 
flag  to  indicate  that  all  information  is  available  to  generate  the  header.  The  algorithm 
receives  the  start  flag,  window  number,  range  quantity,  and  memory  flag  at  the  same  time 
and  stores  the  values  in  internal  variables.  As  a  passing  ROI  is  read  from  a  FIFO  buffer, 
the  value  is  evaluated  using  a  switch  statement.  The  switch  statement  sets  a  bit  in  a  16- 
bit  temporary  mask  corresponding  to  the  appropriate  ROI.  The  temporary  mask  is  then 
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added  to  an  ROI  mask  that  is  preserved  until  the  header  is  ready  to  downlink.  This 
addition  could  be  implemented  using  a  bitwise  OR.  As  soon  as  all  ROI  have  been 
processed,  the  algorithm  outputs  the  first  header  element.  On  the  next  clock  cycle,  the 
algorithm  outputs  the  window  number,  ROI  mask,  and  memory  fiag.  Finally,  the 
algorithm  resets  all  internal  variables  and  transitions  back  to  the  waiting  state. 

2,  Changes  to  Downlink  Control 

As  discussed  in  Chapter  III,  the  initial  SDR  design  used  an  external  read  enable 
signal  called  rE  Jinal  to  control  when  information  is  transmitted  from  the  downlink  FIFO 
buffer.  While  this  seems  like  a  practical  means  of  leaving  downlink  decisions  in  the 
hands  of  an  external  communications  system,  the  initial  design  did  not  provide  a  means 
of  signaling  to  the  external  system  when  information  was  available  to  be  transmitted. 
Additionally,  there  is  a  one  clock  cycle  delay  between  each  ROI  that  is  output  from 
temporary  memory.  If  this  delay  is  forwarded  directly  to  the  transmitted  output  then  for 
each  transmitted  ROI,  one  clock  cycle  is  wasted  transmitting  invalid  information. 


Figure  30.  Modified  Format  Output  Subsystem  [After  3]. 

Rather  than  simply  forward  a  data_valid  signal  to  the  output  and  leave  decisions 
in  the  hands  of  an  external  circuit,  internal  measures  were  added  to  enforce  greater 
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control  over  the  SDR  output.  Changes  made  to  the  Format  Output  subsystem  are  shown 
in  Figure  30.  A  multiplexer  was  added  to  transmit  a  eonstant  one  if  the  output  data  is 
invalid.  A  finite  state  maehine  ealled  OuputCtl  was  implemented  in  M-Code  to  control 
the  re  input  signal  for  the  downlink  FIFO  buffer  based  availability  of  valid  information 
and  the  external  system’s  ability  to  reeeive  the  information.  The  state  transition  diagram 
for  this  algorithm  is  shown  in  Figure  3 1 . 

The  OutputCtl  algorithm  disables  reading  from  the  downlink  FIFO  buffer  until  a 
header  is  available.  Upon  reeeipt  of  the  hdr_v  signal,  the  OutputCtl  algorithm  transitions 
from  a  waiting  state  to  a  state  that  inerements  a  eounter  variable.  In  this  eounting  state, 
reading  from  the  FIFO  buffer  is  still  disabled.  This  ensures  that  all  potential  delays 
between  sueeessive  ROI  do  not  impaet  the  final  downlink. 


OutputCtl 


Figure  3 1 .  State  Transition  Diagram  for  OutputCtl  Algorithm. 


After  eounting  for  15  eloek  cyeles,  the  algorithm  transitions  to  its  output  state.  In 
this  state,  the  external  rE  Jinal  signal  is  forwarded  to  the  FIFO  re  signal.  The  algorithm 
remains  in  this  state  until  the  downlink  FIFO  buffer  is  empty.  If  the  hdr_v  or  tmpjbusy 
flags  are  asserted,  the  algorithm  immediately  transitions  to  its  counting  state.  Otherwise, 
the  algorithm  transitions  back  to  the  waiting  state. 
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3,  Timing  Analysis 


The  new  output  format  aetually  inereases  the  amount  of  time  required  to  generate 
the  first  header  output  {thdr)-  This  is  beeause  the  modified  outjidr  algorithm  needs  to 
read  and  proeess  each  passed  bin  before  generating  the  header.  The  difference  is 
inconsequential  because  this  time  is  then  removed  from  the  expression  for  tframe-  In  the 
following  expressions,  the  FFT  latency  is  indicated  by  the  term  For  a  system  using 
FFTv4.1,  the  expression  for  thdr  is  updated  to 

Frequency  Window 

Time  Window  '  '  Header  Generation 

, _ A _ ^  #ROi-i  ^ ^ 

hdr^^FFT  +A(M-1)  +  2h - +  1+  ^  F(x)  +  l  +  2  +  #Passed  Bins  .  (V.9) 

2  .v=o 

For  a  system  using  FFTvl  .0,  the  expression  for  thdr  is  updated  to 

Frequency  Window 

Time  Window  '  '  Header  Generation 

^ _ a2 _ ^  -\T  #ROI-l  ^ _ aJ _ ^ 

hdr=LpFT  +4A(M-l)  +  2  +  — +  1+  ^  F(x)  +  l  +  2  +  #PassedBins.  (V.IO) 

2  x=o 

The  expression  for  tframe  is  updated  to 

Transmit  Passed  Bins  Delay  Between  Successive  Bins 

^  +2 +  (Passed  Bins)  +  #Passed  Bins-1  .  (V.ll) 

As  discussed  in  Chapter  III,  these  expressions  indicate  the  time  required  to  write 
information  to  the  downlink  FIFO  buffer.  Since  reading  from  the  FIFO  buffer  is  directly 
dependent  on  an  external  signal,  no  further  timing  analysis  is  required. 

D,  OPTIMAL  MEMORY  CONFIGURATIONS 

This  chapter  examines  how  the  circuit  could  be  changed  through  understanding  of 
the  generalizations  made  in  Chapter  III.  This  section  shows  the  effectiveness  of  these 
changes  by  demonstrating  the  maximum  resource  utilization  in  example  configurations 
for  the  Virtex™-np  and  the  Virtex™-!  FPGA  devices. 
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1. 


VirtexTM-iip  Implementation 


The  SDR  design  was  eonfigured  using  the  FFTv4.1  IP  with  the  memory 
eonfigurations  shown  in  Table  6.  For  this  example,  the  eircuit  was  configured  with 
N  =  1024  andM  =  3 .  The  table  shows  the  expected  memory  utilization  for  the  SDR  in 
this  configuration. 

A  larger  size  downlink  FIFO  was  desired  for  this  configuration,  preferably  the 
maximum  permissible  FIFO  depth  of  64k.  However,  the  maximum  bit  width  permissible 
for  this  depth  is  32  bits  [16].  The  FIFO  depth  was  set  to  1024,  which  permitted 
compilation. 

The  total  memory  required  should  be  approximately  340  kB,  which  constitutes 
21.5  %  of  the  resources  available  on  the  Virtex™-nP  FPGA.  This  would  indicate  that 
there  is  significant  room  to  increase  N  and  M  if  additional  functionality  is  desired  on  this 
device. 


Purpose 

Depth 

Bit  Width 

Memory  Expectation 

PPTv4.1 

UNK 

UNK 

288  kB 

Time  Windowing  EIEO 

512 

76 

4864  Bytes 

Ereq  Windowing  RAM 

512 

52 

3328  Bytes 

Temp  Storage  RAM 

4096 

35 

17.5  kB 

Temp  Storage  RAM 

4096 

35 

17.5  kB 

Downlink  EIEO 

1024 

70 

8960  Bytes 

Total  Memory  Expectation 

340  kB 

Table  6.  Example  Memory  Expectation  Elsing  EETv4.1  with  N  =  1024  and  M  =  3 . 


The  circuit  was  generated  to  the  HDE  Net-list  level  using  System  Generator.  The 
circuit  was  synthesized  using  the  Xilinx  ISE  Project  Navigator.  The  resulting  resource 
estimation  far  exceeded  the  predicted  amount,  as  shown  in  Table  7.  The  circuit  uses  56 
18  kB  blocks  of  BRAM,  corresponding  to  1008  kB  total  memory  usage.  In  this  case,  the 
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disparity  between  the  predieted  value  and  the  one  generated  through  synthesis  does  not 
prevent  the  cireuit  from  being  compiled.  This  anomaly  is  explored  in  more  detail  with 
the  Virtex™-!  implementation. 


Resource 

Used 

Available 

Percent  Used 

Slices 

6267 

9792 

68% 

Flip  Flops 

10486 

19584 

53% 

4  input  FUTs 

9234 

19584 

47% 

Bonded  lOBs 

96 

552 

17% 

BRAM 

56 

88 

63% 

MUFT  18X18 

56 

88 

63% 

GCFK 

1 

16 

6% 

Table  7.  Resource  Estimation  for  SDR  design  [From  20]. 


2.  Virtex^M-i  Implementation 

As  discussed  in  Chapter  II,  the  FFTvl.O  IP  using  triple  memory  configuration 
with  N  =  1024  uses  75%  of  the  memory  resources  on  a  Virtex™-!  FPGA.  Even  with  the 
memory  saving  measures  discussed  earlier  in  this  chapter,  it  is  not  feasible  to  fit  a 
functioning  SDR  design  on  a  single  Virtex™-!  FPGA  using  the  FFTvl.O  IP 
where  V  =  1024.  The  resources  required  in  an  example  where  M<4  are  shown  in 
Error!  Reference  source  not  found. 

In  order  to  further  constrain  the  memory  requirement,  full  precision  was  not  used 
for  the  bin  energy  calculation.  In  the  initial  SDR  design,  all  arithmetic  calculations  added 
bits  to  the  output  data  word  to  prevent  overflow.  This  level  of  precision  is  unnecessary 
because  the  signal  only  needs  to  indicate  if  there  is  enough  energy  to  pass  the  threshold. 
Therefore,  the  data  signal  only  needs  to  have  enough  precision  for  the  minimum 
threshold.  If  arithmetic  operations  result  in  overflow,  the  signal  can  simply  be  saturated 
at  its  maximum  value. 
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Purpose 

Depth 

Bit  Width 

Memory  Expectation 

FFTvl.O 

UNK 

UNK 

12  kB 

Time  Windowing  FIFO 

512 

16 

1  kB 

Freq  Windowing  RAM 

512 

16 

1  kB 

Temp  Storage  RAM 

2048 

16 

4kB 

Temp  Storage  RAM 

2048 

16 

4kB 

Downlink  FIFO 

1024 

32 

4kB 

Total  Memory  Expectation 

26  kB 

Table  8.  Example  Memory  Configuration  Using  FFTvl  .0  IP. 


If  a  multiple-FPGA  system  is  available  as  the  target  device,  then  the  design  could 
be  distributed  between  each  of  the  devices.  As  discussed  in  Section  B,  the  FFTvl. 0  must 
be  compiled  separately  from  the  remainder  of  the  circuit.  This  neatly  divides  the  design 
into  two  pieces  that  could  each  fit  on  its  own  Virtex^M-i  FPGA.  System  Generator 
compiled  the  compression-only  portion  of  the  circuit,  creating  a  Xilinx  ISE  project.  The 
circuit  was  synthesized  through  the  Xilinx  ISE  project  navigator.  The  resulting  resource 
estimation  showed  that  the  circuit  required  six  kB  of  BRAM  more  than  the  expected 
value. 

In  order  to  produce  a  configuration  that  could  plausibly  run  the  circuit  was  further 
divided.  Bin  Analysis  functions  were  assigned  to  one  FPGA.  Temporary  Storage  and 
Downlink  Control  went  to  another  FPGA,  creating  a  three-FPGA  configuration.  The  new 
configuration  is  shown  in  Figure  32. 
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Figure  32.  Partitioned  Compression  Algorithm  for  a  Three-FPGA  Configuration 

[After  3], 


System  Generator  was  used  to  create  a  Xilinx  ISE  project  for  each  part  of  the 
compression  algorithm.  Both  designs  were  synthesized  using  Xilinx  ISE  Project 
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Navigator  to  produce  resource  estimates.  The  bin  analysis  portion  is  shown  in  Table  9. 
The  temporary  storage  and  downlink  eontrol  portion  is  shown  in  Table  9.  The  Temporary 
Storage  and  Downlink  Control  partition  met  expeetations,  with  24  BRAM  bloeks  used. 
At  512  bytes  for  eaeh  bloek,  this  means  that  12kB  was  used,  whieh  eorrelates  with  the 
information  presented  in  Table  10.  The  Temporary  Storage  RAM  and  Downlink  FIFO 
should  use  12kB  of  BRAM. 


Resource 

Used 

Available 

Percent  Used 

Slices 

859 

12288 

6% 

Elip  Elops 

344 

24576 

1% 

4  input  EUTs 

1525 

24576 

6% 

Bonded  lOBs 

91 

404 

22% 

BRAM 

16 

32 

50% 

GCEK 

1 

4 

25% 

Table  9.  Resource  Estimation  for  Bin  Analysis  [From  20]. 


Resource 

Used 

Available 

Percent  Used 

Slices 

109 

12288 

<  1% 

Elip  Elops 

63 

24576 

<  1% 

4  input  EUTs 

175 

24576 

<  1% 

Bonded  lOBs 

114 

404 

28% 

BRAM 

24 

32 

75% 

GCEK 

1 

4 

25% 

Table  10.  Resouree  Estimation  for  Temporary  Storage  and  Downlink  Control 

[Erom  20]. 

The  extra  six  kB  memory  requirement  eomes  from  the  Bin  Analysis  portion  of  the 
eireuit,  as  shown  in  Table  9.  The  Time  Windowing  EIEO  and  Erequeney  Windowing 
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RAM  should  only  require  one  kB  of  memory  eaeh.  Further  analysis  to  isolate  the  exact 
cause  of  this  anomalous  memory  requirement  is  not  included  within  this  thesis  work. 

E,  SUMMARY 

In  this  chapter,  the  information  discussed  in  Chapter  III  was  used  to  make  changes 
to  the  initial  SDR  design.  A  memory  and  bandwidth  conserving  measure  was 
implemented  that  takes  advantage  of  the  conjugate  symmetry  property  of  Fourier 
Transforms.  The  FFTv4.I  IP  was  replaced  with  the  FFTvI.O  IP,  creating  a  configuration 
that  can  be  used  on  a  Virtex^M-i  FPGA.  The  output  format  was  adjusted  to  provide  better 
cueing  to  an  external  communications  system  and  make  more  efficient  use  of  bandwidth. 
Finally,  the  effectiveness  of  these  changes  was  demonstrated  by  compiling  the  design  for 
both  the  Virtex^M-iiP  FPGA  and  the  Virtex^M-i  FPGA.  The  next  chapter  discusses 
measures  that  increase  the  fault  tolerance  of  the  SDR  design,  making  it  more  suitable  for 
the  space  environment. 
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V.  FAULT  DETECTION 


Ultimately,  this  SDR  design  is  intended  for  the  space  environment.  As  discussed 
in  [25],  this  means  that  the  circuit  will  be  vulnerable  to  Single  Event  Upsets  (SEU).  A 
SEU  occurs  when  a  high-energy  particle  causes  a  one-time  bit  flip,  either  in  memory  or  in 
the  output  of  a  combinational  circuit.  As  discussed  in  [6],  SEUs  have  special 
implications  for  EPGA  devices.  If  the  SEU  occurs  on  the  output  of  combinational  logic 
or  in  data  memory  then  the  effect  is  transitory.  If  the  SEU  occurs  in  the  EPGA 
configuration  memory,  the  circuit  configuration  will  change.  This  may  produce 
continuous  errors,  or  may  only  produce  errors  for  a  specific  input  set  depending  on  the 
location  of  the  configuration  fault. 

This  chapter  explores  ways  of  detecting  faults  in  the  circuit.  Eault  detection  flags 
are  communicated  to  the  ground  along  with  the  output  data.  If  the  fault  is  singular,  the 
ground  user  may  elect  to  discard  the  data.  If  the  fault  is  continuous,  the  user  may  elect  to 
reload  the  EPGA  configuration  to  remove  the  fault.  Alternatively,  the  SDR  controller  in 
space  may  be  designed  to  reload  the  configuration  after  a  certain  number  of  repeated 
errors. 

A,  CONSIDERATIONS 

1.  SDR  Considerations 

As  discussed  in  [25],  the  vulnerability  of  a  circuit  to  SEUs  is  dependent  on  its 
area.  Earge  circuits  are  more  susceptible  to  SEUs  than  small  circuits.  In  implementing 
fault  detection  for  the  SDR  design,  the  portions  of  the  circuit  requiring  the  largest  use  of 
combinational  logic  and  memory  were  selected  for  fault  detection  algorithms.  As 
discussed  in  Chapter  III,  the  EET  IP  represents  the  largest  use  of  both  combinational  logic 
and  memory. 

As  discussed  in  [5],  Triple  Modular  Redundancy  (TMR)  presents  one  means  of 
error  detection  and  correction.  Three  copies  of  the  circuit  make  the  same  calculation. 
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Their  results  are  compared  by  a  majority  voter.  If  one  of  the  solutions  does  not  correlate 
to  the  other  two,  an  error  has  occurred  and  the  erroneous  output  is  discarded.  As 
discussed  in  Chapter  IV,  neither  the  Virtex^M-HP  nor  the  Virtex^M-i  FPGA  would  have 
room  to  accommodate  two  additional  copies  of  the  FFT  IP. 

Snodgrass  introduced  the  concept  of  Reduced  Precision  Redundancy  (RPR)  in 
[5].  If  a  less-precise  numerical  solution  is  acceptable,  then  two  low-precision  copies  of 
the  circuit  could  be  used  to  generate  an  upper  and  lower  bound.  If  the  precise  solution  is 
outside  the  bounds,  an  error  has  occurred.  In  that  case,  the  average  between  the  upper 
and  lower  bound  is  used  instead  of  the  precise  output. 

While  RPR  looks  like  a  feasible  means  of  producing  a  fault  correction  by  trading 
precision  for  area,  it  would  not  produce  significant  memory  savings  for  the  FFT  IP  used 
in  this  design.  The  FFT  IP  prevents  any  modification  to  the  internal  circuitry  beyond  the 
configuration  options  presented  to  the  user.  This  prevents  the  use  of  either  RPR  or  TMR 
for  low-level  calculations.  As  discussed  in  [22],  the  FFTvl.O  phase  factor  bit  width  is 
fixed  at  16  bits.  Intermediate  values  are  stored  at  this  level  of  precision,  independent  of 
the  input  data’s  precision.  The  number  of  memory  blocks  required  for  the  FFTv4.1  IP  is 
specifically  listed  in  [15],  grouped  by  target  device  and  configuration  options.  As  with 
the  FFTvl.O  IP,  the  amount  of  memory  required  is  independent  of  the  input  precision. 

For  this  SDR  design,  neither  TMR  nor  RPR  seem  to  be  feasible  options  for 
detecting  errors  that  occur  within  the  FFT  IP.  In  order  to  work  with  the  SDR  design, 
fault  detection  algorithms  must  use  only  the  remaining  chip  space  shown  in  Chapter  IV. 
A  simple  computation  relating  the  FFT  input  to  the  FFT  output  is  desired. 

2.  Parseval’s  Theorem 

As  discussed  in  [9],  the  Discrete  Fourier  Transform  has  the  inner  product 
property,  also  known  as  ParsevaTs  Theorem  where 

^x*[«M«]  =  ^^X*[^]7[^].  (VI.l) 

Nt'o 
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This  equation  relates  the  input  to  the  FFT  direetly  to  the  output  of  the  FFT  in  a  way  that 
is  much  less  computationally  intensive  than  the  FFT  algorithm.  Because  this  expression 
does  not  produce  a  duplicate  FFT  output  it  cannot  be  used  for  error  correction,  but  it  can 
be  used  for  error  detection.  Both  sides  of  the  equation  require  N  complex  multiplication 
operations  and  N  complex  addition  operations. 


Equation  (VI.  1)  must  be  adjusted  so  that  it  can  be  applied  in  this  design.  The 
input  to  the  FFT  is  a  real  signal.  The  termy[n]  is  replaced  with  x\n\.  The  left  side  of  the 
expression  is  adjusted  to 


=  J^x*[n]x[n] 

n=0  n-0 

N-\ 

=  ^  Re^ 

«=0 

=  XRe^  {x[n]] 


(VI.2) 


As  discussed  in  Chapter  II,  the  output  of  the  FFT  is  already  scaled  by  a  factor 
of  1/  N .  The  right  side  of  the  equation  is  adjusted  to  use  the  real  and  imaginary  portions 
of  the  signal.  Then  an  additional  scaling  factor  of  1  /  V  is  included  on  each  side.  This 
produces  an  expression  that  can  be  implemented  using  the  FFT  input  and  output,  where 


XReq4«]}  =  T|;x-[i-]A-W 


n=0 

N-\ 


k=0 

N-\ 


XRe"  {4«]}  =  {-rm}  +  Im^  {X[i-]} 

^  k=Q 


n=0 


(VI.3) 


N-l 


T|;Req4«]}=Z 

J''  «=0  k=0 


Re{X[yt]}^  flm{X[k]} 


N 


N 


Although  Equation  (VI.3)  can  be  easily  implemented  in  hardware,  it  presents  a 
timing  problem  when  used  with  this  design.  The  left  side  of  the  expression  can  be 
calculated  before  the  EET  circuit  has  completed  its  computation.  As  discussed  in  Chapter 
IV,  the  compression  portion  of  the  design  now  uses  only  the  first  N  /  2  points  of  the  EET 
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output.  Depending  on  the  size  of  the  user-defined  ROI,  the  eircuit  might  not  be  able  to 
eomplete  computation  of  the  right  side  of  Equation  (VI. 3)  before  the  header  is  ready  for 
downlink. 


The  conjugate  symmetry  property  shown  in  Equation  (V.l)  and  Equation  (V.2) 
can  also  be  used  to  reduce  the  right  side  of  Equation  (VI. 3).  This  is  shown  in  the 
expression 


NH-l  N-\ 

£  x\k}xm=  £  X[k}x\k\. 


k=0 


k=NI2 


(VI.4) 


This  means  that  Equation  (VI. 3)  can  be  adjusted  to 


1 

N 


N-l  W/2-1 

£Req4«])  =  2£ 


n=0 


k=0 


fRe{X[4]}Y 


-I- 


Im{X[^]} 


N 


(VI.5) 


By  using  Equation  (VI.5),  the  number  of  arithmetic  operations  required  to  calculate  the 
right  side  of  the  expression  is  reduced  by  half.  This  ensures  that  both  sides  of  the 
expression  can  be  calculated  and  compared  in  time  to  include  error  detection  information 
in  the  header  for  each  time  window  data  frame. 


3.  Parity  Checking 

As  discussed  in  [23],  a  parity  bit  can  be  used  to  detect  corruption  of  data.  Eor 
even  parity  a  bit  is  appended  to  the  data  word  to  ensure  that  the  number  of  ones  in  the 
word  is  even.  If  the  number  of  ones  in  the  data  word  is  even,  the  parity  bit  will  be  zero. 
If  the  number  of  ones  in  the  data  word  is  odd,  the  parity  bit  will  be  one.  Wakerly  shows 
that  the  Exclusive  Or  (XOR)  function  can  be  used  to  determine  the  appropriate  value  of 
the  parity  bit  [23].  If  an  odd  number  of  errors  occur  in  the  data  word,  the  parity  bit  will 
detect  that  an  error  is  present.  If  the  number  of  errors  in  the  word  is  even,  they  will  not 
be  detected  by  a  single  parity  bit. 
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B,  MODIFICATIONS  TO  DESIGN 

This  section  discusses  how  the  design  was  modified  to  add  fault  tolerance.  The 
baseline  for  changes  is  the  example  Virtex™-IIP  configuration  shown  in  Chapter  IV.D.l, 
where  M  =  3  and  A  =  1024.  This  model  incorporates  the  changes  discussed  in  Chapter 
IV.  The  Virtex™-IIP  configuration  was  selected  because  using  only  one  chip  was 
desirable  to  facilitate  signal  routing  while  adding  new  features  to  the  design.  ParsevaTs 
Theorem  was  used  to  detect  errors  in  the  FFT  computation.  Parity  bits  are  used  to  detect 
errors  in  other  memory  blocks  within  the  circuit. 

1,  Error  Checking  Using  Parseval’s  Theorem 

An  error  in  the  FFT  computation  is  detected  by  calculating  the  left  and  right  sides 
of  Equation  (VI. 5)  independently.  If  the  results  differ  by  a  predetermined  threshold,  an 
error  flag  for  the  FFT  time  window  is  forwarded  to  the  header  generation  algorithm 
out_hdr.  The  header  was  adjusted  to  forward  error  flags  for  the  time  windows. 

a.  Implementation 

The  FFT  subsystem  was  adjusted  to  calculate  the  left  side  of  Equation 
(VI. 5),  as  shown  in  Eigure  33.  A  multiplier  IP  block  is  used  to  calculate  Re^{x[n]}. 

The  subsystem  labeled  Accum  developed  in  [3]  uses  a  pair  of  accumulators  to 
continuously  sum  the  streaming  input.  This  subsystem  was  used  because  of  its 
demonstrated  reliability.  It  is  possible  that  the  accumulator  subsystem  could  be  replaced 
with  a  simpler  configuration  using  only  one  accumulator.  The  accumulator  would  need 
to  be  configured  to  reinitialize  with  the  current  input  on  reset,  as  discussed  in  [24].  This 
would  permit  the  system  to  calculate  the  sum  of  continuous  inputs  using  only  one 
accumulator. 
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Figure  33.  FFT  Subsystem  Modified  for  Error  Detection  [After  3]. 


The  system  relies  on  timing  signals  based  on  the  FFT  v4.1  IP  to  indicate 
the  beginning  and  end  of  each  FFT  period.  The  xnjndex  signal  corresponds  to  the  index 
of  the  input  points.  This  value  is  compared  to  1023  to  detect  the  end  of  the  current  input 
period.  This  value  is  delayed  by  one  clock  to  indicate  the  start  of  the  next  period.  The 
circuit  would  need  to  be  adjusted  for  the  FFTvl.O  IP  by  using  a  counter  to  simulate  the 
xnjndex  signal. 

The  FFT  subsystem  scales  the  accumulator  output  by  1  /  A  then  delays  the 

N-l 

signal  to  align  it  with  the  FFT  output.  The  value  {^[«]}  leaves  the  FFT  subsystem 

n=0 

as  the  check  signal.  This  signal  is  routed  to  the  Time  Windowing  subsystem.  As 
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discussed  in  Chapter  III,  the  SDR  compression  algorithm  determines  the  energy  in  each 

FFT  point  by  calculating  (Re{X[A:]}/  +(lm|X[A:]}/  .  This  value  is  routed  into  a 

new  FFT  Error  Deteetion  Subsystem,  as  shown  in  Figure  34.  The  e_done  and  p  ent 
signals  are  also  routed  to  the  FFT  Error  Detection  Subsystem  for  timing  purposes.  As 
discussed  in  Chapter  III,  these  signals  indicate  the  start  of  the  EET  output  period  and  the 
current  EET  output  index,  respeetively. 


Eigure  34.  Modification  for  EFT  Error  Detection  [After  3]. 


Within  the  EET  Error  Detection  Subsystem,  another  copy  of  the 
Accumulator  subsystem  is  used  to  calculate  the  sum  of  the  pwr _pts  signal,  as  shown  in 
Eigure  35.  The  sum  is  scaled  by  a  faetor  of  two  completing  all  calculations  required  for 
Equation  (VI. 5).  The  difference  between  the  left  and  right  sides  of  the  equation  is 
calculated  using  a  subtraetion  IP  bloek.  The  difference  is  eompared  with  a  positive  and 
negative  threshold.  If  both  comparisons  yield  a  true  result,  then  no  error  exists.  In  any 
other  case,  the  Error  flag  will  be  set.  The  e  done  signal  is  used  to  indicate  the  start  of  an 
EET  output  sequence.  The  p_cnt  signal  is  used  to  determine  when  enough  points  have 
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been  colleeted  to  calculate  the  sum  on  the  right  side  of  Equation  (VI. 5).  When  this  point 
in  time  is  reached,  the  valid _prep  flag  is  set  to  indicate  that  the  Error  flag  will  be  valid  on 
the  next  clock  cycle. 
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Figure  35.  EFT  Error  Detection  Subsystem. 


The  Error  flag  and  valid _prep  flag  are  passed  to  the  ErrorFlagCtl 
algorithm.  This  algorithm  uses  the  Error  flags  from  each  EFT  period  within  the  user- 
defined  time  window  to  generate  an  error  code,  indicated  by  the  errjwin  signal  in  Figure 
34.  The  length  of  the  error  code  is  equal  to  M,  the  number  of  EFT  periods  in  each  time 
window.  Each  bit  is  a  flag  that  indicates  whether  or  not  an  error  was  detected  within  that 
EFT  period.  The  state  transition  diagram  for  the  ErrorFlagCtl  algorithm  is  shown  in 
Figure  36. 


78 


ErrorFla^Ctl 


Figure  36.  State  Transition  Diagram  for  the  ErrorFlagCtl  Algorithm. 


The  error  eode  is  forwarded  to  the  Header  Generation  Subsystem  to  be 
included  in  the  header  for  each  time  window  frame.  An  additional  FIFO  buffer  was 
added  to  the  subsystem  to  store  the  error  code  until  the  completion  of  the  bin  energy 
analysis.  The  header  format  was  adjusted  to  include  the  error  code,  as  shown  in  Figure 
37. 
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Figure  37.  Header  Format  with  Error  Code. 


b.  Testing 


A  small  set  of  tests  was  conducted  to  ensure  the  circuit  functions  as 
desired,  and  to  determine  an  appropriate  value  for  the  threshold.  The  initial  error 
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threshold  was  set  to  2  .  The  FFT  error  cheeking  measures  were  first  tested  with  no 

additional  adjustments  to  the  circuit.  As  expected,  an  error  code  of  “000”  was  generated 
indicating  that  no  errors  were  detected. 

Next,  a  multiplexer  was  inserted  into  the  path  of  the  FFT  output 
corresponding  to  Re{X[A:]}/A,  as  shown  in  Figure  38.  The  error  injection  circuit 

compares  the  output  of  a  free-running  counter  with  the  constant  3000  to  determine  if  the 
correct  value  will  be  forwarded  or  a  constant  error.  Looking  at  the  first  time  window,  this 
means  that  the  first  FFT  period  is  error-free,  while  the  second  and  third  will  contain 
errors.  The  circuit  failed  to  detect  the  error  using  the  initial  threshold.  When  the 
threshold  was  lowered  to  2 the  expected  error  code  of  “Oil”  was  generated.  The  code 
was  properly  forwarded  to  the  header  generation  algorithm,  and  was  included  in  the  data 
frame  sent  to  the  downlink. 


Figure  38.  Error  Injection  Circuit  for  FFT  Output. 

2,  Memory  Error  Detection 

As  discussed  in  Section  A,  parity  can  be  used  to  detect  errors  in  memory.  The 
most  likely  place  for  a  memory  error  in  this  circuit  is  in  the  Temporary  Storage 
subsystem.  This  subsystem  has  the  largest  memory  requirement  in  the  circuit  outside  the 
FFT  IP.  It  is  also  has  the  longest  temporal  storage  requirement.  As  discussed  in  Chapter 
111  and  Chapter  IV,  information  could  be  held  for  up  to  MN  clock  cycles.  For  these 
reasons,  the  Temporary  Storage  Subsystem  was  selected  to  as  the  best  location  to 
implement  a  memory  error  detection  algorithm. 
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a.  Implementation 

As  discussed  in  Chapter  III  and  Chapter  IV,  the  signals  stored  in  the 
Temporary  Memory  Subsystem  are  35-bit  numbers  when  using  the  Virtex™-IIP 
configuration.  In  order  to  generate  parity  bits  for  each  incoming  number,  the  circuit  must 
evaluate  the  expression 

P  =  XOR{Bit[34],Bit[33],...Bit[l],Bit[0]}  (VI.6) 

To  implement  Equation  (VI.6),  a  tree  of  Bit  Basher  blocks  is  placed  in  series  with  a  tree 
of  XOR  gates,  as  shown  in  Figure  39. 
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Figure  39. 


3  5 -bit  Parity  Generator. 


As  discussed  in  [24],  the  Bit  Basher  blocks  require  no  hardware  overhead. 
They  are  used  here  as  a  means  of  re-interpreting  a  single  multiple-bit  signal  as  multiple 
signals  with  smaller  bit  widths.  Each  input  signal  can  only  be  divided  four  ways,  so  a 
tree  of  Bit  Basher  blocks  is  required  to  re-interpret  a  single  35-bit  signal  as  35  one-bit 
signals.  This  method  of  error  deteetion  will  also  work  with  the  Virtex™-!  configuration, 
provided  a  parity  generator  for  a  16-bit  data  word  is  used. 

The  35-bit  parity  generator  was  added  to  the  Temporary  Memory 
Subsystem  to  generate  a  one -bit  parity  code  from  the  FFT  point  entering  memory.  This  is 
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shown  for  the  real  portion  of  the  signal  in  Figure  40.  The  parity  bit  is  stored  in  a  separate 
Dual-Port  RAM  using  the  same  addressing  signals  as  the  Dual-Port  RAM  for  the  data 
word.  On  the  output  from  the  Dual-Port  RAM,  a  new  parity  bit  is  generated  and 
eompared  with  the  one  stored  in  memory  using  an  XOR  gate.  If  the  bits  do  not  match, 
then  an  error  has  occurred.  This  design  is  duplicated  for  the  imaginary  portion  of  the 
signal. 


Figure  40.  Modification  for  Memory  Error  Detection  (After:  [3]). 


The  error  flags  are  forwarded  to  the  Format  Output  Subsystem.  A  finite 
state  machine  was  created  to  accumulate  error  flags  and  generate  an  error  code  that  is 
appended  to  the  end  of  the  downlink  data  frame.  The  state  transition  diagram  for  the 
ParityFlagCtl  algorithm  is  shown  in  Figure  41.  In  State  Zero,  the  algorithm  waits  for 
output  from  the  Temporary  Memory  Subsystem.  An  initial  error  code  of  three  was  used 
for  troubleshooting  purposes. 

When  the  tmpjbusy  flag  is  asserted,  valid  output  from  the  Temporary 
Memory  subsystem  will  be  available  on  the  next  clock  cycle.  The  error  code  is  shifted 
left  by  two  bits  to  make  room  to  record  error  flags  and  the  algorithm  transitions  to  State 
Two.  Errors  in  the  real  portion  of  the  signal  are  recorded  in  the  left  bit.  Errors  in  the 

imaginary  portion  of  the  signal  are  recorded  in  the  right  bit.  The  error  code  saves  one 
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error  from  each  ROI  selected  for  transmission.  If  an  error  is  recorded,  then  a  flag  is  set  to 
prevent  additional  errors  from  interfering  with  the  values  stored  in  the  error  code.  When 
the  algorithm  detects  that  a  new  ROI  is  about  to  be  read  from  Temporary  Memory,  it 
shifts  the  error  code  left  by  two  bits  to  make  room  for  the  next  set  of  flags. 


ParitvFlagCtI 


Figure  41 .  State  Transition  Diagram  for  the  ParityFlagCtl  Algorithm. 

When  the  tmp_busy  transitions  to  zero,  all  the  values  for  the  current  data 
frame  have  been  read  from  Temporary  Memory.  The  ParityFlagCtl  Algorithm 
transitions  to  State  Two  and  sets  the par_valid  flag,  indicating  that  the  error  code  is  valid. 
On  the  next  clock  cycle,  the  algorithm  transitions  back  to  State  Zero  to  await  the  next  set 
of  values  from  Temporary  Memory. 

The  ParityFlagCtl  Algorithm  was  inserted  in  the  Format  Output 
Subsystem,  as  shown  in  Figure  42.  The  logic  gate  used  to  control  write  enable  for  the 
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Downlink  FIFO  was  adjusted  so  the  par_valid  signal  would  also  enable  writing  to  the 
FIFO.  An  additional  multiplexer  was  added  to  the  FIFO  input  path,  allowing  the 
par_code  signal  to  be  added  to  the  data  frame. 


Parity  Flag  Control 
Algorithm 


Figure  42.  Modification  to  Communicate  Memory  Errors  [After  3]. 

As  discussed  in  Chapter  IV,  the  maximum  number  of  user  defined  ROIs  is 
set  at  15.  The  par_code  signal  is  a  32-bit  number,  which  accommodates  the  maximum 
number  of  ROIs.  The  format  of  the  signal  is  shown  in  Table  11.  The  code  must  be 
interpreted  using  the  number  of  ROIs  that  were  selected  for  transmission.  As  discussed 
in  Chapter  IV,  this  information  is  included  in  the  data  frame  header. 
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15  ROI  Transmitted 

One  ROI  Transmitted 

par_code[i\\l>Qi\ 

11 

00 

par_code[29'.2^^ 

ROIO[real  error,  imag  error] 

00 

par_code[21 :26] 

ROI  1  [real  error,  imag  error] 

00 

par_code[3:2'\ 

ROI  13  [real  error,  imag  error] 

11 

par_code\\\Qi\ 

ROI  14  [real  error,  imag  error] 

ROI0[real  error,  imag  error] 

Table  1 1 .  Format  of  the  par_code  Signal. 


b.  Testing 

The  parity  eheeking  algorithms  were  tested  using  an  input  data  set  that 
was  seleeted  to  ensure  that  two  frequeney  bins  would  be  seleeted  for  downlink.  On  the 
first  test,  no  errors  were  injeeted.  As  expeeted,  the  output  par_code  signal  was  48io,  or 
1 IOOOO2,  indioating  that  no  error  was  deteeted.  The  initial  error  eode  of  three  was  shifted 
left  by  two  bits  when  eaeh  sueeessive  frequeney  bin  was  sent  to  the  Downlink  FIFO. 

On  the  seeond  test  the  output  of  the  imaginary  Dual-Port  RAM  was 
multiplexed  with  an  error  injeetion  eireuit,  similar  to  the  one  shown  in  Figure  38.  A 
eonstant  error  was  injeeted  when  the  free-running  eounter  exeeeded  4767  eloek  eyeles. 
This  time  was  seleeted  to  ensure  that  the  seeond  frequeney  bin  would  eontain  errors,  but 
the  not  the  first  frequeney  bin.  As  expeeted,  the  output  par  code  signal  was  49 10  or 
11 0001 2,  indioating  that  an  error  was  deteeted  in  the  imaginary  portion  of  the  seeond 
frequeney  bin. 


3,  Resource  Check 

The  SDR  eireuit  with  error  deteotion  oapability  was  oompiled  using  System 
Generator  to  generate  a  Xilinx  ISE  projeot.  Xilinx  ISE  Projeot  Navigator  was  used  to 
synthesize  the  design.  The  results  of  the  synthesis  are  shown  in  Table  12.  The  table  also 
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displays  the  increase  in  percentage  of  device  resources  used  in  comparison  with  the 
baseline  design  without  an  error  detection  capability.  As  expected,  the  memory 
utilization  only  increases  slightly.  The  error  detection  algorithms  require  a  considerable 
increase  in  the  number  of  resources  used  for  combinational  logic,  but  the  design  IS  still 
able  to  fit  on  a  Virtex™-nP  FPGA. 


Logic 

Used 

Avail 

Pet  Avail 

Pet  Increase 

Slices 

9304 

9792 

95  % 

27% 

Flip-Flops 

14925 

19584 

76% 

23  % 

4-input  FUTs 

14234 

19584 

72% 

25  % 

lOBs 

96 

552 

17% 

0% 

BRAMs 

59 

88 

67% 

4% 

MUFT  18x18 

61 

88 

69% 

6% 

GCFK 

1 

16 

6% 

0% 

Table  12.  Virtex™-nP  Resources  Required  for  Error  Detection  (From:  [20]). 


C.  CONCLUSIONS 

This  chapter  discussed  the  requirements  of  a  circuit  designed  for  the  space 
environment.  Various  options  for  implementing  fault  tolerance  were  explored.  The  use 
of  ParsevaTs  Theorem  to  check  for  errors  in  the  EFT  computation  was  introduced, 
implemented,  and  tested.  Parity-checking  algorithms  were  added  to  detect  faults  in  the 
Temporary  Memory  Subsystem.  Algorithms  were  added  to  communicate  errors  in  the 
output  data  frame.  The  resulting  design  was  synthesized,  ensuring  that  the  SDR  design 
would  still  fit  on  a  Virtex™-nP  FPGA.  The  next  chapter  presents  a  summary  of  this 
thesis  work  and  provides  recommendations  for  future  work. 


86 


VI.  CONCLUSION 


This  chapter  presents  a  summary  of  the  objectives  achieved  through  this  review  of 
the  initial  SDR  design.  Recommendations  for  future  work  are  provided. 

A.  CONCLUSIONS 

The  methods  used  for  Fourier  Analysis  were  reviewed.  The  timing  and  resource 
requirements  of  the  FFTv4.1  and  FFTvl.O  IP  circuits  provided  by  Xilinx  were  examined 
using  a  eonfiguration  with  N  =  1024.  The  information  provided  in  [15]  and  [22]  was 
verified  through  simulations  with  DC  input  signals  and  synthesis  using  System  Generator 
and  the  Xilinx  ISE  Project  Navigator. 

The  initial  SDR  design  presented  in  [3]  was  examined  using  a  configuration  with 
FFT  length  N  =  1024  and  the  number  of  FFT  periods  per  time  window  M  set  to  three. 
Internal  timing  considerations  were  clarified  using  state  transition  diagrams  and  timing 
charts  to  illustrate  the  behavior  of  the  circuit’s  control  algorithms.  General  expressions 
were  created  regarding  the  circuit’s  timing  and  resource  requirements  for  any  selection  of 
N  and  M.  These  expressions  can  be  used  as  design  equations  to  estimate  appropriate 
values  for  N  and  M  given  a  fixed  amount  of  available  memory  on  a  target  FPGA  device. 

Changes  were  made  to  increase  downlink  efficiency,  decrease  latency,  and 
decrease  memory  utilization  by  taking  advantage  of  the  conjugate  symmetry  inherent  in 
the  FFT  algorithm.  The  Format  Output  Subsystem  was  adjusted  to  improve  signal  flow 
to  an  external  communications  system.  The  downlink  data  frame  format  was  adjusted  to 
inerease  efficiency.  One  possible  circuit  configuration  was  presented  that  fits  on  a 
Virtex™-nP  FPGA.  An  alternate  eonfiguration  was  provided  with  the  cireuit  functions 
distributed  over  three  intereonnected  Virtex™-!  FPGAs. 

The  circuit  was  made  more  suitable  for  the  space  environment  through  the 
addition  of  a  fault  detection  capability.  Options  for  fault  detection  and  correction  were 
examined.  A  fault  detection  method  using  Parseval’s  Theorem  was  designed  and  its 
funetionality  was  verified.  Parity  cheeking  algorithms  were  added  to  deteet  faults  in 
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some  of  the  memory  banks  used  in  the  design.  The  format  of  the  downlink  data  frame 
was  adjusted  to  eommunieate  fault  status  to  terrestrial  systems. 

The  ehanges  diseussed  in  Chapter  V  represent  the  final  modifieations  to  the 
design  for  this  body  of  work.  The  final  eonfiguration  was  saved  in  the  Simulink®  model 
SDR1024Mod8C.  Appendix  A  lists  all  assoeiated  files  required  for  eompilation,  as  well 
as  the  revision  history  of  the  design.  Wright  provided  a  list  of  items  that  would  need  to 
be  ehanged  to  ensure  the  design  funetions  under  different  eonfiguration  options  [3].  This 
list  is  updated  in  Appendix  A. 

B,  RECOMMENDATIONS 

This  thesis  work  foeused  on  the  praetieal  implementation  of  the  algorithm 
doeumented  in  [3].  To  supplement  the  recommendations  listed  in  [3],  additional  work 
could  be  done  in  the  following  areas  to  improve  the  circuit’s  capability  and  verify  its 
reliability. 

1.  Bin  Overlap 

The  current  algorithms  were  written  with  the  assumption  that  user-defined 
frequency  bins  could  not  overlap.  Although  the  circuit  would  have  no  difficulty 
processing  overlapping  bins,  this  would  lead  to  inefficiency  in  the  downlink  since  FFT 
points  in  the  overlap  region  would  be  sent  twice.  In  order  to  correct  this  inefficiency,  the 
bin  range  input  to  the  rejmp  algorithm  will  need  to  be  adjusted.  The  circuit  would  need 
to  detect  if  bin  overlap  exists  and  determine  if  both  overlapping  bins  pass  the  bin 
threshold  analysis.  If  one  or  both  of  the  overlapping  bins  fails  the  threshold  analysis,  the 
circuit  functions  normally. 

2.  Pipelining 

As  discussed  in  Chapter  III,  this  circuit  does  not  take  advantage  of  the  pipelining 
features  available  in  the  multipliers  and  adders.  The  compression  portion  of  the  circuit 
uses  two  adders,  two  multipliers,  and  two  accumulators.  Pipelining  these  arithmetic 
operations  would  lower  the  clock  period,  increasing  the  sample  rate  along  with  the 
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sensitivity  of  the  eireuit.  This  change  would  require  some  adjustments  to  the  timing 
algorithms.  The  pwrjime  algorithm  and  accum_ctl  algorithm  are  the  most  likely  to  be 
affected  by  pipelining.  The  delays  inserted  for  timing  purposes  would  also  need  to  be 
adjusted  to  accommodate  pipelining. 

3.  Comprehensive  Test  Set 

The  modifications  made  in  this  thesis  work  were  tested  using  a  small  range  of 
possible  inputs  and  user-defined  configurations.  The  work  documented  in  [3]  used  a 
larger  range  of  tests.  Even  these  tests  did  not  come  close  to  testing  the  algorithm  under 
its  most  stressful  conditions.  The  circuit  needs  to  be  tested  with  the  user-defined  ROI 
maximized  over  a  period  of  time  that  would  confirm  its  ability  to  gracefully  overwrite 
obsolete  memory.  Additionally,  it  needs  to  be  tested  with  input  signals  that  are  not  tuned 
to  the  sampling  rate  of  the  EFT.  The  fault  detection  algorithms  should  be  tested  with  a 
wider  range  of  faults  and  different  fault  thresholds  to  determine  an  optimal  configuration. 
Finally,  the  fault  detection  algorithms  could  be  tested  in  a  radiation  environment  to  verify 
that  they  perform  as  designed. 

4,  Improve  User  Interface 

The  algorithm  in  its  current  form  is  vulnerable  to  user  misuse  through  poor 
configuration  choices.  The  algorithm  could  be  adjusted  to  detect  and  prevent  user- 
entered  configurations  that  would  cause  the  algorithm  to  crash  or  produce  anomalous 
output  through  unintended  memory  overwrites.  Since  the  procedure  to  set  up  the 
configuration  is  complex,  a  user  guide  should  be  developed.  The  compilation 
instructions  included  in  Appendix  A  could  be  used  as  a  baseline.  A  graphical  user 
interface  could  also  be  developed  using  MATLAB®  to  facilitate  the  setup  process. 
Finally,  some  decompression  algorithms  for  the  design  make  use  of  intermediate  signals 
sent  to  the  MATFAB®  workspace  for  troubleshooting.  Since  these  signals  would  not  be 
available  in  the  actual  implementation,  a  user-friendly  decompression  algorithm  using 
only  the  final  output  of  the  SDR  circuit  should  be  developed. 
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5. 


Explore  Other  Methods  to  Compute  the  EFT 


This  thesis  explored  using  both  the  FFTv4.1  IP  and  the  FFTvl.O  IP  as  the  means 
to  eompute  the  FFT  for  the  SDR  eireuit.  While  the  FFTv4.1  IP  is  compatible  with  the 
Virtex™-nP  FPGA,  neither  of  these  IP  circuits  is  compatible  with  the  Virtex^M-H  FPGA. 
If  the  Virtex™-!!  FPGA  is  desired  as  the  target  device,  the  FFTvS.l  and  FFTv3.2  IP 
circuits  could  be  examined  as  a  feasible  means  to  compute  the  FFT  for  the  SDR  circuit 
[15],  [22],  [26],  [27], 

This  thesis  focused  on  using  existing  IP  to  compute  the  FFT.  As  discussed  in 
Chapter  V,  this  prevented  the  use  of  internal  fault  detection  and  correction  methods  such 
as  Triple  Modular  Redundancy  and  Reduced  Precision  Redundancy.  If  this  level  of  fault 
tolerance  is  desired  for  this  circuit,  an  FFT  would  have  to  be  developed  in  the  System 
Generator  environment  using  fault-tolerant  algorithms  within  the  Cooley-Tukey 
algorithm.  An  RPR  version  of  this  algorithm  that  could  be  used  as  a  basis  for  an  FFT 
circuit  for  the  SDR  design  is  demonstrated  in  [6]. 
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APPENDIX  A.  IMPLEMENTATION  DETAILS 


This  appendix  provides  lists  of  the  fdes  required  to  run  the  simulations  discussed 
in  this  research.  A  set  of  instructions  for  simulation  and  synthesis  of  the  design  is 
provided.  Additionally,  the  list  of  modules  affected  by  changes  to  N  and  M  is  updated. 

A,  REQUIRED  FILES 

A  new  Simulink®  model  was  saved  with  each  major  adjustment  to  the  design. 
All  of  these  files  and  folders  used  for  this  design  are  available  on  DVD.  The  NPS  CRL 
Lab  Manager  can  be  contacted  for  a  copy  of  the  DVD.  A  list  of  important  subdirectories 
is  shown  in  Table  13.  A  list  of  m- files  required  to  configure  the  MATLAB®  environment 
for  the  simulation  is  provided  in  Table  14.  A  list  of  the  Simulink®  models  representing 
different  stages  in  development  is  provided  in  Table  15.  A  list  of  m-files  required  to 
implement  algorithms  within  the  m-code  blocks  of  the  initial  SDR  design  is  provided  in 
Table  16.  Files  that  replace  or  augment  the  ones  used  in  the  initial  SDR  design  for 
follow-on  models  are  listed  in  Table  17. 
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Sub  Directory 

Description 

FFTTesting 

Contains  all  files  used  to  test  the  FFTv4.1  and  FFTvl.O  IP  blocks  as 

discussed  in  Chapter  II. 

Durkelnit 

Contains  all  model  and  m-code  files  developed  for  the  thesis  work 

documented  in  [3]. 

Durkelnit/Mods 

Contains  all  model  and  m-code  files  modified  for  the  work 

documented  in  this  thesis. 

FFTdev 

Contains  models  and  m-code  files  for  an  initial  attempt  to  develop  an 

FFT  algorithm  in  the  System  Generator  environment  using  IP  blocks 

for  elementary  math  operations. 

Table  13.  Important  Sub  Directories  Available  on  DVD. 


File  Name  (,m) 

Description 

input _sig_gen 

Creates  an  input  signal  for  SDR  testing  [3]. 

input _sig_gen2 

Doubles  the  number  of  frequencies  in  the  input  signal. 

ROIctrl 

Creates  ROI  input  to  SDR  design  [3]. 

Test _control Jesting 

File  used  to  run  tests  discussed  in  [3]. 

Test_controljesting_Rev2 

Modifies  the  input  data  set  to  focus  on  the  first  time  window. 

Test_controljesting_Rev3 

Increases  the  size  of  user-defined  ROI. 

Test j:ontrol Jesting _Rev4 

Adjusts  user-defined  ROI  and  input  signals  to  test  N  /  2 

configuration. 

Test_controljesting_Rev5 

Reformats  model  I/O  signals  for  a  2-chip  implementation. 

Test_controltesting_Rev6 

Reformats  model  I/O  signals  for  a  3 -chip  implementation. 

Table  14.  M-Code  files  External  to  the  SDR  Design. 
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Model  Name  (,mdl) 

Description 

FFTvl  Tests 

Test  the  performance  of  the  FFTvl.O  IP  block. 

FFTv4Test 

Test  the  performance  of  the  FFTv4.1  IP  block. 

SDR_1024 _point_l 

Initial  SDR  design,  described  in  [3] 

SDRJ024MOD2 

Circuit  modified  for  N  !  2  compression 

SDRJ024MOD3A 

Compression  algorithm  only.  FFT  computation  removed. 

Mod3_Chiplof3 

FFTvl.O  computation  only.  Used  in  conjunction  with 

either  SDR_1024MOD3A  or  SDR_1024Mod7Chip2B  and 

SDR_1024Mod7Chip3B. 

SDRJ024MOD4 

Adds  adjustments  to  downlink  control. 

SDRJ024MOD5 

Reduces  memory  requirement  of  Time  Windowing  and 

Freq  Windowing  subsystems  to  minimum. 

SDRJ024MOD6 

Optimal  configuration  for  Virtex™-nP.  All 

troubleshooting  signals  removed  to  track  signal  formats. 

SDRJ  024MOD  7Chip2B 

Window  Analysis  subsystems  only.  Chip  2  of  a  3-chip 

Virtex™-!  configuration. 

SDRJ  024Mod7  Chips  B 

Temporary  Storage,  Format  Output,  and  Downlink  Control 

Subsystems.  Chip  3  of  a  3-chip  Virtex™-!  configuration. 

SDRJ024Mod8C 

Added  FFT  error  checking  algorithms.  This  model 

includes  error  injection  algorithms  for  testing. 

Table  15.  Simulink®  Model  Files  Used  in  this  Thesis  Work. 
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Algorithm 

File  Name  (.m) 

Description 

accum_ctrl 

accum_ctrl_3_l 

Manages  signals  to  control  two  accumulators. 

hdr_data_mgt 

hdr_data_mgt 

Manages  signals  passed  to  the  out  hdr  algorithm. 

mem  _pri 

mem  _pri 

Sets  a  flag  to  use  more  a  smaller  set  of  ROIs  when 

in  a  restricted  memory  condition. 

outjidr 

outJidr 

Produces  a  header  for  the  output  data  frame. 

pwrjime 

pwr_time_l 

Manages  Time  Windowing  subsystem  signals. 

Original  Design,  assumes  continuous  FFT  output. 

re Jreqjwin 

re  Jreqpwin_l 

Manages  signals  and  addressing  when  reading  user- 

defined  ROIs  from  a  dual-port  RAM. 

rejmp 

retmpl 

Manages  signals  and  addressing  when  reading 

values  out  of  Temporary  Memory. 

wejemp  Jft 

wejemp  JftJ 

Manages  signals  and  addressing  when  writing  FFT 

output  to  Temporary  Memory. 

wejimepwin 

we  Jim  ejvinJ 

Manages  signals  and  addressing  when  writing  Time 

Window  subsystem  output  to  a  dual-port  RAM. 

wind_anal 

wind_anal_2 

Manages  the  signals  associated  with  evaluating  the 

number  of  ROIs  that  pass  threshold  analysis. 

Table  16.  M-Code  files  Used  in  the  Initial  SDR  Design,  as  Discussed  in  Chapter  III. 
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Algorithm 

File  Name  (.m) 

Description  of  Change 

ErrorFlagCtl 

FrrorFlagCtl 

Interprets  errors  detected  in  the  FFT  algorithm  and 

generates  an  error  code  as  discussed  in  Chapter  V. 

outjidr 

outJidrModl 

Produces  an  efficient,  fixed-length  header. 

outjidr 

outJidrModl 

Includes  the  FFT  error  code  in  the  header. 

OutputCtl 

OutputCdModO 

Controls  reading  from  the  downlink  FIFO  buffer 

as  discussed  in  Chapter  IV. 

ParityFlagCd 

ParityFlagCd 

Saves  errors  detected  in  memory  through  parity 

checking  algorithms  and  produces  a  parity  error 

code  for  downlink. 

pwrjime 

pwr  time  MOD2 

Adjusts  algorithm  to  interpret  only  N  !  2  points. 

re Jreqjwin 

re _freq_win_Modl 

Adjusts  algorithm  to  interpret  only  N  !  2  points. 

rejmp 

re_tmp_Modl 

Adjusts  algorithm  to  read  only  N !  2  points  per 

FFT  period. 

wejemp  Jft 

wejemp  Jft_Modl 

Adjusts  algorithm  to  write  only  N !  2  points  per 

FFT  period. 

wejimejwin 

weJimejvin_Modl 

Adjusts  algorithm  to  interpret  only  N  !  2  points. 

wind_anal 

windjinal_Modl 

Added  comments  to  m-code  for  clarification. 

Table  17.  M-Code  Files  Added  or  Adjusted  for  Changes  to  the  SDR  Design. 


All  files  must  be  opened  on  a  system  configured  for  use  with  the  appropriate 
MATLAB®,  System  Generator,  Xilinx  ISE,  and  ModelSim®  software,  as  listed  in 
Chapter  II.  Opening  the  design  without  all  software  correctly  configured  results  in 
unrecoverable  corruption  of  the  model  file.  Working  from  a  set  of  backup  files  is 
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recommended  until  the  functionality  of  all  design  tools  is  verified.  All  required  files  must  be 
located  within  the  same  directory  to  be  recognized  when  the  simulation  is  run  [28]. 


B,  INSTRUCTIONS 

The  following  steps  describe  how  to  use  the  Simulink®  model  of  the  SDR  design 
for  testing  and  synthesis.  Initial  versions  of  the  Test _control Jesting  series  of  m-code 
files  are  designed  to  run  MATLAB ©/Simulink®  tests  simply  by  executing  the  file.  This 
method  was  not  used  in  the  development  of  this  thesis  work  because  it  prevents  the 
operator  from  checking  the  intermediate  progress  of  the  test. 

1,  Examine  the  Simulink®  Model 

Open  the  desired  Simulink®  model.  Check  all  m-code  blocks  and  ensure  that 
each  m-file  is  included  in  the  same  directory  as  the  model. 

2.  Conduct  Incremental  Execution  of  the  Test  File 

Open  the  desired  m-code  test  fide.  The  test  files  are  divided  into  multiple  cells, 
each  of  which  can  be  executed  independently.  The  details  of  using  cells  are  listed  in  the 
“Rapid  Code  Iteration  Overview”  section  of  [28].  The  beginning  of  a  cell  is  identified  by 
a  header  comment  in  bold  type.  To  evaluate  an  individual  cell,  move  the  text  cursor 
within  the  cell.  This  highlights  the  cell.  Select  “Cell-^  Evaluate  Current  Cell,”  or  type 
“Ctrl+Enter”  to  evaluate  this  portion  of  m-code.  Execute  the  first  four  cells  in  the  file, 
ending  with  the  cell  labeled  “Input  Signal  Generation.”  Check  the  MATLAB® 
workspace  to  ensure  that  all  required  input  variables  have  been  assigned.  Once  this  is 
accomplished,  the  Simulink®  model  is  ready  for  simulation  and  HDE  generation. 

The  test  file  for  models  configured  for  three-chip  design  provides  additional  cells 
to  reformat  the  output  of  each  simulation  so  that  it  can  be  used  as  input  for  the  next 
simulation.  After  the  first  four  cells  are  executed,  the  simulation  of  the  first  model  which 
contains  the  EFT  IP  is  conducted.  Once  the  simulation  is  complete,  executing  the  cell 
labeled  “Reformat  EFT  Output”  adjusts  the  simulation  output  to  be  used  as  input  for  the 
second  model  which  contains  the  Windowing  Algorithm  subsystem  and  Window 
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Analysis  subsystem.  Once  the  second  simulation  is  complete,  executing  the  cell  labeled 
“Reformat  Compression  Output”  adjusts  the  simulation  output  to  be  used  as  input  for  the 
third  model,  which  contains  the  Temporary  Memory  subsystem.  Format  Output 
subsystem,  and  Downlink  Control  subsystem. 

3.  Synthesis 

The  process  to  create  a  FPGA  configuration  file  from  a  System  Generator 
Simulink®  model  is  summarized  in  [3].  Specific  details  are  provided  in  [10]  and  [11]. 
For  this  research,  the  design  was  compiled  to  the  HDL  netlist  level  by  selecting  this 
option  and  clicking  “Generate”  in  the  System  Generator  GUI.  This  creates  a  Xilinx  ISE 
project,  which  can  be  opened  using  Xilinx  ISE  Project  Navigator. 

C.  CHANGING  PARAMETERS 

The  impact  of  adjusting  the  parameters  N  and  M  on  the  required  depth  of  storage 
devices  is  discussed  briefly  in  [3],  which  lists  all  storage  devices  and  associated  control 
algorithms.  This  information  was  clarified  in  Chapter  III,  which  identified  that  not  all 
storage  devices  are  sensitive  to  changes  in  N  and  M.  The  new  list  of  storage  devices 
sensitive  to  changes  in  N and  M  is  shown  in  Table  18. 


Storage  Device 

Write  Control  Module 

Read  Control  Module 

EIEO  (Time  Wind) 

pwrjime 

pwrjime 

Dual  Port  RAM  (Ereq  Wind) 

we_time_win 

re Jreqpwin 

Dual  Port  RAM  (Real  Data) 

wejemp  Jft 

rejemp 

Dual  Port  RAM  (Imag  Data) 

wejemp  Jft 

rejemp 

Dual  Port  RAM  (Real  Parity) 

wejemp  Jft 

rejemp 

Dual  Port  RAM  (Imag  Parity) 

wejemp  Jft 

rejemp 

Table  18.  Storage  Devices  Sensitive  to  Changes  in  N  and  M  [After  3]. 
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In  addition  to  the  storage  deviees  listed,  the  FFT  error  detection  algorithm  is 
sensitive  to  changes  in  N  and  M.  As  discussed  in  [15],  changing  N  alters  the  timing 
performance  of  the  FFTv4.1  IP  block.  The  block  would  need  to  be  retested  to  determine 
the  latency  between  the  first  real  signal  input  and  the  first  FFT  output  point.  This 
information  can  be  used  to  adjust  the  delay  blocks  used  to  align  the  error  detection  signal 
with  FFT  output,  as  discussed  in  Chapter  V.  Comparison  blocks  used  for  timing  in  FFT 
error  control  algorithms  are  also  dependent  on  the  value  of  N.  Changing  the  value  of  M 
requires  a  manual  adjustment  to  the  ErrorFlagCtl  algorithm  and  the  outjidr  algorithm 
because  the  length  of  the  error  code  is  equal  to  M. 
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APPENDIX  B.  ADDITIONAL  APPLICATIONS 


This  section  classified  and  is  bound  separately.  Contact  the  Naval  Postgraduate 
School  Special  Security  Officer  for  access. 
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